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ABSTRACT 


This  project  presents  a  comprehensive  summary  of  the  problems  under  study,  our  approaches,  technical  con¬ 
tributions,  and  accomplishments  that  arc  supported  throughout  the  entire  project  period,  April  15,  2008  to  July 
31,  2014.  There  arc  two  time  periods  for  this  project,  one  is  the  original  award  from  September  1  2008  to  August 
31,  2011,  and  the  other  from  September  1,  2011  to  July  31,  2014  as  the  supplemental  period.  In  the  original 
project,  we  addressed  two  thrusts,  namely  network  vulnerabilities  and  recovery  strategies  in  the  aftermath  of 
WMD  attacks.  More  specifically,  this  research  targets  a  set  of  fundamental  issues  to  understand  network  re¬ 
sponse  following  an  attack  by  WMDAVME:  how  to  model  a  network  topology  in  the  presence  of  attacks/failures 
from  random  threats;  how  to  estimate  or  predict  network  survivability  to  sustain  critical  applications;  how  to  de¬ 
sign/form  network  architecture  to  approach  the  theoretical  limits  of  network  robustness;  and  how  to  inter-operate 
with  other  available/limited  sources  for  fast  communication  recovery. 

With  the  increasing  attention  on  the  national  infrastructure,  such  as  civilian  and  military  telecommunication 
networks,  power  grids,  and  transportation  systems,  these  large-scale,  inter-connected  networks  are  vulnerable  to 
WMD  attacks.  Under  such  attacks  on  communication  media  and  facilities,  the  intrinsic  nature  of  networking  in¬ 
evitably  surrenders  a  given  infrastructure  to  cascading  or  correlated  failures  in  both  temporal  and  spatial  domains, 
which  in  turn,  have  a  great  and  potentially  devastating  impact  on  network  operation  availability  and  performance. 
Therefore,  we  proposed  to  conduct  a  two-year  optional  extension  during  which  we  focused  on  network  vulner¬ 
ability  to  such  failures,  and  on  assessing  network  availability,  subject  to  variations  in  traffic  and  user  demands, 
as  well  as  critical  elements.  In  particular,  we  proposed  to  explore  correlated  failures,  and  their  propagation  with 
respect  to  temporal  and  spatial  domain,  in  both  inhomogeneous  networks  and  physical  systems. 

To  this  end,  we  first  studied  the  underlying  network  topology  that  is  used  to  monitor,  detect,  and  identify 
attacks,  by  analyzing  the  topological  structure  (i.e.,  operational  state)  of  a  network  such  that  can  i)  very  efficiently 
detect  and  localize  failures;  1  ii)  further  address  the  so-called  dynamic  failures,  and  more  specifically,  correlated 
failures,  to  not  only  achieve  their  early  detection  and  classification  but  to  also  track  them  and  proceed  to  their 
isolation  and  all  this,  as  proposed  below,  by  a  distributed  computational  strategy  which  hence  easily  scales  with 
the  network  size.  We  then  subsequently  explored  structurally  how  cascading  and  correlated  failures  are  formed 
and  will  propagate  in  networks,  and  how  such  failures  disrupt  communications.  We  further  moved  on  to  the 
analysis  and  evaluation  of  physical  networks  by  extending  our  models  to  smart  grids,  which  are  also  known 
as  the  next  generation  power  grid.  In  short,  our  proposed  work  included  rapidly  detecting,  measuring,  and 
tracking/predicting  correlated  failures  in  both  time  and  spatial  domains,  as  well  as  optimal  design  against  such 
failures  for  inhomogeneous  networks  and  their  applications  in  physical  networks. 

As  a  result,  our  efforts  advance  the  knowledge  and  fundamental  understanding  of  the  of  WMD  attacks,  and 
the  resulting  failures  in  the  infrastructure  networks  which  form  the  backbone  of  civilian  and  military  capabilities. 
Furthermore,  we  believe  that  our  results  will  provide  significant  insights  on  the  scope  of  damage  due  to  cascading 
and  correlated  failures,  including  catastrophic  loss  of  connectivity,  unsuccessful  missions,  slow-down  of  the 
Internet,  and  damage  to  the  economy  and  society  at  large.  All  together,  the  proposed  research  will  leverage  DoD 
capability  in  response  to  attacks  from  WMDAVME. 

1  This  is  more  than  an  order  of  magnitude  more  efficient  than  any  previous  approach. 
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1  Objectives  and  Status  of  Efforts 


Timely  and  accurate  gathering  and  dissemination  of  information  following  an  attack  by  Weapons  of  Mass  De¬ 
struction  (WMD)  or  Weapons  of  Mass  Effect  (WME)  arc  crucial  to  preserving  DoD  military  capabilities  on  the 
substratum  of  national  infrastructure  networks  and  military  tactical  networks.  The  destruction  of  infrastructure, 
combat  and  tactical  networks,  will  be  an  immediate  result  from  WMD  stressors  which  can  be  biological,  chemi¬ 
cal,  and  nuclear  weapons.  For  example,  widely  distributed  failures  of  electronics  from  a  nuclear  electromagnetic 
pulses  or  a  long-term  denial  of  network  elements  or  segments  may  be  a  consequence  of  WMD  contamination. 
In  addition  to  the  impact  of  WMD  stressors  on  the  national  network  infrastructure,  the  impact  of  nuclear  elec¬ 
tromagnetic  pulse  on  wireless  networks  may  even  be  much  more  severe,  as  radio  channels  are  open  medium  for 
wireless  communications.  This  is  in  paid  due  to  wireless  networks  having  become  an  indispensable  element  in 
civilian  and  military  communications,  and  being  almost  exclusively  the  sole  means  of  communication  during 
a  disaster  relief  and/or  battlefield  settings.  For  military  applications,  relying  on  centralized  systems  with  base 
stations  and  on  an  established  network  is  simply  not  even  an  option  in  light  of  typically  hostile  and  dynamic 
(probably  even  unknown)  environments.  To  overcome  the  limited  radio  transmission  ranges  (i.e.,  wireless  de¬ 
vices  can  only  communicate  with  others  within  their  transmission  range),  nodes  arc  equipped  with  an  ability  to 
forward  information  on  behalf  of  others,  i.e.,  multihop  communications,  which  led  to  the  research  and  develop¬ 
ment  of  ad  hoc  networks  and  have  witnessed  tremendous  evolution  in  recent  years.  In  addition,  wireless  ad  hoc 
networks  may  take  on  a  role  of  temporary  replacement  or  supplement  of  the  fixed  infrastructure  in  reacting  to 
failures  and  dynamic  networking  environments  or  applications. 

Therefore,  in  the  original  project,  we  addressed  two  thrusts,  namely  network  vulnerabilities  and  recovery 
strategies  in  the  aftermath  of  WMD  attacks.  More  specifically,  this  research  targets  a  set  of  fundamental  issues  to 
understand  network  response  following  an  attack  by  WMD/WME:  how  to  model  a  network  topology  in  the  pres¬ 
ence  of  attacks/failures  from  random  threats;  how  to  estimate  or  predict  network  survivability  to  sustain  critical 
applications;  how  to  design/form  network  architecture  to  approach  the  theoretical  limits  of  network  robustness; 
and  how  to  inter-operate  with  other  available/limited  sources  for  fast  communication  recovery.  That  means,  we 
have  the  following  objectives: 

•  We  aim  to  develop  new  analytical  models  in  order  to  capture  the  impact  of  multiple  failures  due  to  random 
threats  (aforementioned)  on  network  dynamics  in  the  face  of  WMD/WME  disruption,  such  as  connection 
(link)  status,  connectivity,  and  topology. 

•  We  aim  to  design  and  analyze  new  metrics  that  it  can  characterize  and  estimate  the  impact  of  interdepen¬ 
dence  of  a  multitude  of  failures  regarding  network  responses,  based  on  either  peacetime  data  or  simulated 
threats. 

•  We  aim  to  develop  new  approaches  that  re-form  or  self-heal  a  network  architecture  given  complete  or 
incomplete  knowledge  of  communication  environments  and  function  of  other  nodes  (Note  that  we  use 
nodes  for  both  people  and  wireless  devices  in  this  context),  e.g.,  cooperative  nodes  or  non-cooperative 
neighboring  nodes  whose  cooperative  functions  may  be  disabled  by  energy  depletion  or  WMD  stressors. 
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•  We  aim  to  design  new  distributed  methods  that  arc  distributed  and  with  an  ability  to  recover  commu¬ 
nications  in  heterogeneous  networking  environments  by  utilizing  resources  from  other  networks  (sensor 
networks  and  back-haul  networks)  under  the  circumstances  with  limited  connection  or  outage  due  to  un¬ 
known  causes.  By  distributed ,  we  mean  that  the  designed  solutions  should  enable  each  node  to  work 
independently  with  others,  without  requiring  a  centralized  controller. 

The  first  two  objectives  arc  under  the  thrust  of  understanding  network  response  to  WMD  stressors,  with 
respect  to  the  rules  and  parameters  that  govern  network  vulnerability  and  survivability.  The  last  two  objectives 
are  under  the  thrust  of  recovery  strategies  to  re-form  a  network  and  to  inter-operate  with  other  networks.  In 
combination,  we  tackle  the  problem  of  identifying  fundamental  principles  that  govern  network  responses,  and  of 
facilitating  robust,  tactical  communication  networks  under  random  threats. 

With  the  increasing  understanding  of  network  responses  to  WMD  failures  and  their  impact,  it  is  more  and 
more  clear  that  our  national  infrastructure,  such  as  civilian  and  military  telecommunication  networks,  power 
grids,  and  transportation  systems  arc  vulnerable  to  large-scale  attacks  in  WMD  environments.  Under  such  at¬ 
tacks  on  communication  media  and  facilities,  the  intrinsic  nature  of  networking  inevitably  surrenders  a  given 
infrastructure  to  cascading  or  correlated  failures  in  both  temporal  and  spatial  domains,  which  in  turn,  have  a  great 
and  potentially  devastating  impact  on  network  operation  availability  and  performance.  In  the  extension  period  of 
this  project,  we  focus  on  network  vulnerability  to  such  failures,  and  on  assessing  network  availability,  subject  to 
variations  in  traffic  and  user  demands,  as  well  as  critical  elements. 

We  note  that  there  has  been  very  limited  work  on  the  impact  of  tempo-spatially  correlated  and  dynamic 
failures  which  are  quite  unique  in  networks  subjected  to  WMD  environments.  It  is  in  fact  and  in  paid,  due 
to  the  lack  of  sound  mathematical  models  of  such  failures,  and  of  understanding  these  arising  complex  issues 
in  networks  which  cannot  easily  be  described  in  terms  of  graphs  and  methodologies  for  simple,  homogeneous 
networks,  as  well  as  of  predicting  network  inter-dependent  responses.  Consequently,  there  exists  a  critical  gap  in 
the  knowledge  required  for  research  in  mitigating  network  vulnerabilities  and  their  potentially  dire  consequences. 
Towards  reducing  this  gap,  we  propose  to  study  the  following  issues  in  the  expansion  period  of  time: 

(1)  how  to  model  and  analyze  correlated  failures  in  temporal  and  spatial  domains,  with  respect  to  their  for¬ 
mation,  propagation,  and  evolution  capability? 

(2)  how  to  measure  or  predict  network  vulnerabilities  in  an  inter-dependent  networks?  For  instance,  what  is 
the  temporal  correlation  between  a  cyber  attack  and  a  physical  fault  in  power  grid? 

(3)  how  to  design  fast  and  distributed  algorithms  to  ensure  network  security  by  detecting,  localizing,  and 
tracking  failures?  We  focused  our  work  on  sensor  networks,  which  have  wide  range  of  applications  including 
surveillance  and  monitoring  of  potentially  hazardous  and  inaccessible  regions. 

Status  of  Efforts:  Through  our  productive  research  in  the  past  five  years,  we  have  achieved  significant  research 
results  toward  the  understanding  and  construction  of  robust  network  architecture  to  resist  WMD/WME  threats. 
As  the  first  step  of  our  research  efforts,  we  have  designed  a  novel  node  behavior  model  to  accurately  capture  the 
behavior  transitions  of  a  node  under  WMD  attacks,  such  as  mobile,  cooperative,  faulty,  destructive,  and  dead. 
The  characterization  of  individual  node  behaviors  lays  the  foundation  for  our  advanced  studies  on  the  network 
structures,  based  on  this  we  have  further  estimated  the  network  survivability  and  communication  feasibility  in 


WMD  occurrences  and  designed  a  PRO  Active  routing  protocol  to  maintain  network  performance  with  tolerance 
of  multiple  node  failures.  Moreover,  we  have  also  thoroughly  investigated  the  network  counter-failure  capability 
and  the  failure  notification  promptness  by  defining  the  novel  concepts  of  network  devolution  process  and  infor¬ 
mation  propagation  speed.  In  addition,  a  new  topological  analysis  approach  has  been  used  to  detect  failures  in 
data  space,  which  has  the  advantages  of  early  detection  and  accurate  localization  of  failures. 

In  the  extended  period  of  time,  we  also  (i)  designed  of  a  low  complexity  distributed  algorithm  for  detecting 
and  tracking  such  failures.  We  assume  that  nodes  inside  the  failure  region  arc  either  destroyed  or  unable  to 
communicate  with  any  other  node.  The  algorithm  presented  here  does  not  assume  any  co-ordinate  information  for 
the  nodes.  We  evaluate  the  algorithm  using  simulations;  (ii)  studied  of  correlated  failures  by  considering  constant 
and  generic  impact  radius  and  their  distribution,  focusing  on  the  persistent  failures,  which  is  one  more  step  further 
than  the  one-time  failure  we  studied  earlier;  (iii)  investigated  the  impact  of  cyber-attack,  focusing  on  the  CSMA- 
CA  based  networks,  which  is  one  of  the  most  widely  used  access  technique  in  communication  networks.  In  other 
words,  the  results  of  our  study  can  be  applied  to  a  broad  range  of  networks  of  the  infrastructure  networks.  The 
key  contribution  of  our  work  is  that  we  quantify  the  impact  of  attackers’  gain  and  thus  find  out  what  attacks 
would  generate  the  most  harmful  disruption  to  the  network,  in  contrast  to  many  prior  studies  that  are  focused 
on  the  qualitative  description  and  justification;  (iv)  studied  the  fast  tracking  failures  and  identifying  correlated 
failures  in  power  grid.  As  a  result,  we  arc  able  to  track  the  failure  without  the  need  for  any  requirements  on  node 
density;  and  (v)  developed  a  Greenbench  that  integrates  power  grid  simulator  PSCAD  and  networking  simulator 
OMNeT++.  In  this  simulator,  we  also  designed  three  data-centric  attacks  and  studied  cascading  failures  in  the 
power  grids  due  to  cyber  attacks. 

In  this  final  report,  we  present  the  research  issues  under  study,  our  approaches  and  technical  contributions  in 
the  following  five  categories: 

1 .  Mobility  modeling  and  analysis  of  coverage  properties  in  mobile  networks. 

2.  Network  connectivity  and  vulnerability:  modeling,  analysis,  and  countermeasure. 

3.  Detection,  localization,  and  tracking  of  systematic  failures. 

4.  Correlated  failures  and  their  propagation  in  inhomogeneous  networks. 

5.  Cascading  Failures  in  Power  Grids  due  to  Communication  and  Cyber  Attacks. 

Specifically,  we  plan  to  focus  on:  (i)  The  design  of  a  PROActive  routing  protocol  which  is  able  to  avoid  delivering 
data  to  destructive  or  faulty  nodes  so  as  to  maintain  network  operation  and  performance  in  the  presence  of 
multiple  failures;  (ii)  Under  random  failures,  the  fundamental  understanding  of  network  topology  devolution  and 
transitions;  (iii)  Given  the  occurrence  of  a  failure,  the  speed  of  information  propagation  in  large-scale  networks; 
and  (iv)  a  new  approach  to  characterize  network  data  through  topological  analysis.  Among  four  topics  under 
study,  the  first  one  is  in  the  thrust  of  design  robust  network  architecture  against  failures  and  attacks,  while  the 
last  three  topics  arc  in  the  thrust  of  under-  standing  of  network  responses  to  failures  with  respect  to  fundamental 
limitations. 
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2  Mobility  Modeling  and  Analysis  of  Coverage  Properties 

2.1  Semi-Markov  Smooth  (SMS)  Mobility  Model 

Node  mobility,  by  and  large,  describes  the  presence  of  a  node  in  a  network,  is  a  key  factor  that  defines  network 
architecture,  especially  in  multi-hop  wireless  networks.  By  mobility,  a  node  can  move  around  and  thus  yielding 
node  to  node  radio  links  for  communications.  During  the  course  of  studying  mobility-induced  failures,  we  have 
found  that  existing  mobility  models  do  not  have  the  desired  properties  for  our  analysis.  We  arc  motivated  to 
design  a  new  mobility  model  that  can  abide  by  the  physical  law  of  moving  objects  to  avoid  abrupt  moving 
behaviors,  and  can  provide  a  microscopic  view  of  mobility  such  that  node  mobility  is  controllable  and  adaptive 
to  different  network  environments.  In  summary,  this  model  is  expected  to  unify  the  desired  features  as  follows: 

1.  Smooth  and  sound  movements:  A  mobility  model  should  have  temporal  features,  i.e.,  a  mobile  node’s  current 
velocity  is  dependent  on  its  moving  history  so  that  smooth  movements  can  be  provided  and  mobile  nodes  should 
move  at  stable  speed  without  the  average  speed  decay  problem  [1]. 

2.  Consistency  with  the  physical  law  of  a  smooth  motion:  In  order  to  mimic  the  kinetic  correlation  between 
consecutive  velocities  in  a  microscopic  level  of  a  node,  a  mobility  model  should  be  consistent  with  the  physical 
law  of  a  smooth  motion  in  which  there  exists  acceleration  to  start,  stable  motion  and  deceleration  to  stop  for 
controllable  mobility  [2,  3]. 

3.  Uniform  nodal  distribution:  As  most  of  analytical  studies  of  MANETs  are  based  on  the  assumption  of  uniform 
nodal  distribution,  such  as  network  capacity  and  delay  [4],  network  connectivity,  topology  control  [5]  and  link 
change  rate  [6],  a  mobility  model  should  generate  uniform  spatial  node  distribution.  Otherwise,  the  non-uniform 
node  distribution  caused  by  a  mobility  model  may  invoke  misleading  information  and  results  [7] . 

4.  Adaptation  to  diverse  network  application  scenarios:  In  order  to  properly  support  rich  MANET  applications 
having  complex  node  mobility  and  network  environments,  such  as  group  mobility  and  geographic  restriction,  a 
generic  mobility  model  which  is  adaptive  to  different  mobility  patterns  is  highly  desirable.  TABLE  1  illustrates  a 
detail  comparison  based  on  properties  of  current  typical  mobility  models  and  those  of  our  proposed  SMS  model, 
where  independent  mobility  parameters:  speed  ( V ),  movement  duration  (T),  destination  (I))  and  direction  ( 6 ) 
with  respect  to  different  mobility  patterns  arc  also  included. 

2.1.1  Model  Description 

Based  on  the  physical  law  of  a  smooth  motion,  a  movement  in  the  SMS  model  contains  three  consecutive  moving 
phase:  Speed  Up  phase,  Middle  Smooth  phase,  and  Slow  Down  phase,  respectively.  After  each  movement,  a 
mobile  node  may  stay  for  a  random  pause  time. 

•  Speed  Up  Phase  (a-Phase) 
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Table  1 :  Properties  of  Different  Mobility  Models. 


Attributes 

RW  [8] 

RWP  [9] 

RD  [7] 

SR  [10] 

GM  [11] 

SMS 

Parameters 

v,e 

V,  D 

V,  9,  T 

V,  9 

V,  9 

V,  9,  T 

Movement 

{moving, 

{moving, 

{moving, 

{moving, 

{moving} 

{speed-up,  middle 

Phases 

pause} 

pause} 

pause} 

pause} 

smooth,  slow¬ 

down,  pause} 

Smoothness 

No 

No 

No 

Yes 

Yes 

Yes 

Speed  Decay 

May 

Yes 

No 

No 

No 

No 

Uniform  Node 

close 

no 

yes 

close 

yes 

yes 

Distribution 

Mobility  Scale 

macroscopic 

macroscopic 

macroscopic 

microscopic 

microscopic 

microscopic 

Unified  Model 

No 

No 

No 

No 

No 

Yes 

Controllability 

low 

Low 

Low 

Medium 

Medium 

High 

For  every  movement,  an  object  needs  to  accelerate  its  speed  before  reaching  a  stable  speed.  During  time 
interval  [to,ta]  =  [to,  to  +  a  At],  an  SMS  node  travels  with  a  time  steps.  At  initial  time  to,  the  node 
randomly  selects  a  target  speed  va  G  [t'min,  nmax],  a  target  direction,  (pn  G  [0, r],  and  the  total  number  of 
time  steps  a  G  [amin,  amax].  These  three  random  variables  are  independently  uniformly  distributed. 


Figure  1 :  An  example  of  speed  vs.  time  in  one  SMS  movement. 

•  Middle  Smooth  Phase  (/3-Phase) 
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In  reality,  after  the  speed  acceleration,  a  moving  object  should  have  a  smooth  motion  according  to  its 
stable  velocity.  Correspondingly,  once  the  node  transits  into  /3-phase  at  time  ta,  it  randomly  selects  (3  time 
steps  to  determine  the  middle  smooth  (/3-phase)  duration  interval:  (ta,tp\  =  (ta,tQ  +  /3At].  Where  /3  is 
uniformly  distributed  over  [/3mm.  /3max] .  Within  /3-phase,  the  mobility  pattern  at  each  time  step  is  similar 
to  what  is  defined  in  Gauss  Markov  (GM)  model  [11]. 

•  Slow  Down  Phase  (7-Phase) 

In  real-life,  every  moving  object  needs  to  reduce  its  speed  to  zero  before  a  full  stop.  In  order  to  avoid 
the  sudden  stop  event  happening  in  the  SMS  model,  we  consider  that  the  SMS  node  experiences  a  slow 
down  phase  to  end  one  movement.  In  detail,  once  the  node  transits  into  slow  down  (7-phase),  at  time 
tp,  it  randomly  selects  7  time  steps  and  a  direction  </>7  G  [0,  27t]  .  Where  7  is  uniformly  distributed  over 
[7mm j  7max]  •  In  7-phase,  the  node  evenly  decelerates  its  speed  from  vp,  the  ending  speed  of  /3-phase,  to 
v1  =  0  during  7  time  steps. 

2.1.2  Semi-Markov  Process  of  SMS  Model 

We  consider  pause  as  another  phase,  then  the  stochastic  process  of  SMS  model  is  described  as  an  iterative  four- 
state  transition  process.  Let  I  denote  the  set  of  phases  in  an  SMS  movement,  then  /(f)  denotes  the  phase  of 
SMS  process  at  time  f,  where  /  =  {Ia,  Ip,  /7,  Ip } .  Accordingly,  { Z{t )  ;*>  0}  denotes  the  process  which  makes 
transitions  among  phases  in  the  stochastic  modeling  of  SMS  movements.  Since  the  transition  time  between 
consecutive  moving  phases  (states),  i.e.,  phase  duration  time,  has  discrete  uniform  distribution,  instead  of  an 
exponential  distribution,  {Z(t)  }  is  a  semi-Markov  process  [12].  This  is  the  very  reason  that  our  mobility  model 
is  called  Semi-Markov  Smooth  model  because  it  has  an  Semi-Markov  process  and  it  complies  with  the  physical 
law  with  smooth  movement.  Let  ir  =  ( 77,. ,  7 Tp,  7r7,  irp)  denote  the  time  stationary  distribution  of  SMS  process. 
Then,  the  time  stationary  distribution  for  each  phase  of  SMS  model  is: 

=  fiS, Pr0b{m  =  Im  6  /}  =  E{T}+'lE{Tr}  ■  0) 

where  E { Tm }  is  the  expected  duration  time  of  m-phase  in  an  SMS  movement.  E{T}  and  E { Tp }  arc  the 
expected  SMS  movement  period  and  pause  period,  respectively.  Specifically,  E{T}  =  E{aAt}  +  E{/3At}  + 
E{yAt}.  Since  At  is  a  constant  unit  time,  for  the  sake  of  simplicity,  Af  is  normalized  to  1  second  in  the  rest  of 
the  paper. 

2.1.3  Analysis  of  Steady-State  Speed  and  Node  Distribution 

To  generate  stable  nodal  movements,  a  sound  mobility  model  should  select  the  speed  independently  from  travel 
times  [1],  which  is  exactly  what  occurs  in  SMS  model.  Here,  we  evaluate  the  stochastic  property  of  steady-state 
speed  in  SMS  model  and  verify  that  SMS  model  can  eliminate  speed  decay  problem  and  achieve  stable  nodal 
movements.  In  order  to  find  out  whether  there  exists  the  speed  decay  phenomenon  in  SMS  model,  it  is  necessary 
to  obtain  both  initial  average  speed  E{vini}  and  average  steady-state  speed  E{vss}. 
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According  to  the  initial  stage,  each  node  starts  from  an  SMS  phase  with  a  certain  state  probability  based 
on  the  time  stationary  distribution  of  the  SMS  process.  The  average  speed  in  each  moving  phase  of  an  SMS 
movement  is  obtained  as:  Eja{v}  =  Ep{v}  =  2 Ejfj{v }  =  ^E{va}.  Then  we  derive  the  CDF  of  steady-state 
speed  Pr{Vss  <  i;}  can  be  derived  from  the  limiting  fraction  of  time  when  step  speeds  of  a  node  arc  less  than  v, 
as  the  simulation  time  t  approaches  to  infinity.  Let  M(t)  and  M.p(t)  denote  the  total  number  of  time  steps  that  a 
node  travels  and  pauses  during  [0,  t\,  respectively.  Thus,  Pr{Vss  <  v}  is  derived  as: 


2~in= 1  L{vn<v}  '  2L/n=l  1{»n<»} 


Mp(t) 


Pr{Vss  <  n}  =  lim 


t->  oo  M(t)  +  Mp(t) 


where  lj.i.  is  the  indicator  function.  Thus,  if  the  event  that  { vn  <  v}  is  true,  then  liVn<v\ 
l{Vn<v\  =  0.  Finally,  we  can  obtain: 


(2) 

1,  otherwise 


E{vss}  =  EIa{vss}  +  EI/3{vss}  +  E^{vss}  +  EIp{vss} 

\E{va}{E{a}  +  2E{f3}  +  E{1}) 

E{T}  +  E{Tp} 

We  observed  that  the  average  initial  speed  is  exactly  same  as  average  steady-state  speed  in  SMS  model,  i.e., 
E  { vmi }  =  E  {  vss  } .  Therefore,  we  proved  that  SMS  model  does  not  have  speed  decay  problem. 

Since  an  SMS  node  selects  direction,  speed  and  phase  time  independently,  SMS  model  can  be  considered  as 
an  enhanced  random  direction  (RD)  model  with  memorial  and  microscopic  property  on  step  speed  and  direction. 
RD  model  was  proved  to  maintain  uniform  node  distribution  in  [7].  Here,  we  want  to  prove  that  SMS  model 
also  yields  uniform  node  distribution.  We  evenly  distribute  all  mobile  nodes  in  the  simulation  region  at  the  initial 
time.  For  a  simple  representation,  we  normalize  the  size  of  the  simulation  region  to  [0,  l)2.  (Xj,Yj),  vj,  and 
4>j  denote  the  ending  position,  speed  and  direction  in  a  node’s  jth  step  of  its  first  movement,  respectively.  When 
an  SMS  node  reaches  a  boundary  of  the  simulation  region,  it  wraps  around  and  reappears  instantaneously  at  the 
opposite  boundary  in  the  same  direction  to  avoid  biased  simulation  results.  Under  this  condition  of  border  wrap, 
we  have  the  following  Lemma: 


Lemma  1.  In  SMS  model,  if  the  initial  position  P{  0)  and  the  first  target  direction  <fia  of  a  mobile  node  are  chosen 
independently  and  uniformly  distributed  on  [0,  l)2  X  [0,  2n)  at  time  t  =  0,  then  the  location  and  direction  of  the 
node  remain  uniformly  distributed  all  the  time. 


Given  that  the  initial  position  (A'q.  Yo)  and  fa  of  a  node  have  independently  uniform  distribution,  the  joint 
probability  of  ending  position  and  direction  of  the  node’s  first  step  movement  is: 


Pr(X i  <  xi,Yi  <  yi,fi  <  0) 

=  Pr{Xl  <  xffi  <  9)  ■  Pr(Y1  <  <  9)  ■  Pr(<j>  1  <  9) 

1  re  /  r1 

Hso+^i  costal)— [®o+wi  cos(0i)J<a;i},t‘Z'O 


27 r 


h=o 


£0=0 


1 


{yo+vi  sin(0i)—  [yo+vi  sin(<£i)J  <yi 


xryi9 

27 r 


(4) 
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The  result  in  (4)  shows  that  (Xi,  Yi)  and  ((>\  arc  uniformly  distributed  on  [0,  l)2  x  [0,  2n).  Following  the  same 
methodology,  by  induction  on  each  following  step.  Lemma  1  is  proved.  The  detailed  proof  is  described  in  [13], 

2.1.4  Simulation  Results  and  Model  Comparisons 

Flere,  we  verify  the  above  theoretical  analysis  of  SMS  model  by  simulations  and  compare  the  results  with  RWP 
and  GM  models.  We  integrate  our  SMS  model  into  the  setdest  of  ns-2  simulator,  which  currently  provides  both 
an  original  and  a  modified  version  of  RWP  model.  In  order  to  compare  simulation  results  between  RWP  and 
SMS  model,  1000  mobile  nodes  move  in  an  area  of  1401m  x  1401m  during  a  time  period  of  1500  seconds.  For 
a  better  demonstration,  we  simulated  both  the  SMS  model  and  the  original  RWP  model  with  zero  pause  time. 
Both  GM  and  SMS  model  set  the  time  slot  At  as  1  second  and  the  memory  parameter  £  as  0.5,  respectively.  In 
SMS  model,  we  consider  the  range  of  each  moving  phase  duration  time  as  [6, 30]  seconds. 

•  Average  Speed 

Here,  we  arc  interested  in  comparing  the  average  speed  between  SMS  model  and  RWP  model  and  validate 
our  analytical  proof  shown  in  Section  2.1.3.  To  obtain  the  average  node  speed,  we  first  calculate  the 
average  speed  of  each  node  within  every  10  seconds,  and  then  calculate  the  average  speed  among  all  the 
nodes.  The  corresponding  numerical  results  of  average  speed  vs.  a  time  period  of  1500  seconds  are  shown 
in  Figure  2.  Given  the  simulation  condition  of  zero  pause  time  and  E{a}  =  E{(3}  =  E{ 7},  from  (3), 
the  theoretical  result  of  £^{nss}  of  SMS  model  is  obtained  as:  E { vss }  =  iE{ va }  =  6.7  m/sec.  From 
Figure  2,  we  observe  that  the  average  speed  of  the  SMS  model  is  stable  from  the  beginning  of  simulation 
at  the  value  around  6.7  m/sec,  which  perfectly  matches  the  theoretical  result.  Therefore,  the  simulation 
results  validate  our  analytical  conclusion  that  the  average  speed  of  SMS  model  does  not  decay  over  time. 
Whereas,  the  average  speed  of  RWP  model  keeps  on  decreasing  as  the  simulation  time  progresses,  which 
is  its  well-known  average  speed  decay  problem  [14]. 

•  Spatial  Node  Distribution 

To  verify  our  derivation  of  node  distribution,  we  distribute  nodes  uniformly  in  the  simulation  region  at  the 
initial  time.  Then,  we  sample  the  node  position  at  the  500th  second  for  SMS  model,  and  the  1 0t)0//'  second 
for  both  RWP  and  SMS  models.  A  top  view  of  two-dimensional  spatial  node  position  of  RWP  and  SMS 
models  are  shown  in  Figure  3.  The  results  of  RWP  model  in  Figure  3(a)  show  that  the  node  density  is 
the  maximum  at  the  center  of  the  region,  while  it  is  almost  zero  near  the  network  boundary,  which  agrees 
with  the  previous  study  [15].  In  contrast,  in  Figures  3(b)  and  3(c),  the  two  node  density  samples  of  the 
SMS  model  at  different  time  instants  are  similar  and  mobile  nodes  are  evenly  distributed  in  the  simulation 
region.  Since  these  two  time  instants  are  arbitrarily  selected,  we  verified  our  proof  that  the  SMS  model 
with  border  wrap  maintains  uniform  spatial  node  distribution  over  time. 
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Figure  2:  Average  speed  vs.  simulation  time. 


(a)  RWP  2-dimensional  at  the  1000th  sec.  (b)  SMS  2-dimensional  at  the  500th  sec. 


(c)  SMS  2-dimensional  at  the  1000th  sec. 


Figure  3:  Top-View  of  node  distribution  of  the  RWP  model  at  the  1000th  sec  and  the  SMS  model  at  the  500th 
and  the  1000th  sec,  respectively. 


2.1.5  Comparison 

We  also  compare  the  simulation  results  of  link  lifetime  distribution  and  average  node  degree  among  the  RWP, 
GM  and  SMS  models.  We  find  that  the  probability  mass  function  (PMF)  of  link  lifetime  of  both  SMS  model 
and  GM  model  decreases  exponentially  with  time.  In  contrast,  there  is  a  peak  at  the  25th  second  of  link  lifetime 
distribution  in  RWP  model.  Hence,  it  turns  out  that  mobility  models  with  macroscopic  mobility  pattern  would 
have  different  link  and  path  properties  from  those  of  mobility  models  with  microscopic  pattern,  such  as  the  GM 
and  SMS  models.  Therefore,  SMS  model  is  more  accurate  for  the  simulation  on  link  lifetime  in  MANET  than 
other  models.  We  find  that  the  result  obtained  in  RWP  model  is  apparently  larger  than  that  in  GM  and  SMS 
model.  This  is  because  the  majority  of  nodes  move  into  the  center  region  in  RWP  model  as  the  simulation 
time  proceeds.  That  means  the  network  connectivity  evaluation  based  on  RWP  model  could  be  over  optimistic. 
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Therefore,  SMS  model  with  uniform  node  distribution  is  preferable  for  network  connectivity  study  in  MANET. 


2.2  Coverage  Analysis  of  Mobile  Networks 

In  this  part  of  the  project,  we  investigated  coverage  properties  of  mobile  networks.  Consider  the  scenario  where 
a  set  of  mobiles  nodes  are  following  a  certain  mobility  pattern  in  a  region  of  interest.  A  natural  problem  which 
arises  in  these  cases  is  the  measurement  of  coverage  properties  over  time  of  the  network.  Previous  work  in  this 
area  was  concerned  with  ’’snap-shot”  statistics  such  as  average  area  covered  at  a  given  time  or  the  time  taken 
until  every  point  in  the  region  is  covered  at  least  once.  But,  statistics  such  as  average  time  for  which  a  coverage 
hole  remain  in  the  network  until  it  disappears  or  merges  with  some  other  hole,  have  not  been  addressed  in  the 
literature.  The  later  statistics  were  the  target  of  this  research,  and  we  outline  our  results  below. 

2.2.1  Problem  formulation 

The  coverage  information  of  the  mobile  is  summarized  in  a  ‘barcode’  describing  the  birth  and  death  times  of 
homological  features  in  the  network  over  time,  and  we  describe  the  relationship  between  these  features  and  the 
coverage  properties.  The  barcode  is  obtained  by  employing  a  method  from  the  mathematical  field  of  computa¬ 
tional  topology,  called  zigzag  persistent  homology  [16].  Specifically,  the  network  at  each  time  t%  is  modeled  as 
a  simplicial  complex  Kti,  and  the  relation  between  the  network  at  two  consecutive  times  t,  and  tl+  \  is  inferred 
through  their  inclusions  into  the  union.  This  results  in  a  “zig-zag”  diagram  as  shown  below: 

(Ktl  U  I<t2 )  (Kt2  U  Kt3 )  (Kt^  U  KtT ) 

/  \  /  \  /  \  (5) 

'  '  '  ^tj1 

This  sequences  of  spaces  and  the  inclusions  maps  in  turn  produce  a  sequence  of  homology  spaces  and  linear 
maps  as  follows, 


U  Kt2)  H1(Kt2  U  Kta)  H\(KtT1  U  Ktr) 

/  \  /  \  /  \ 

Hi(Kt2)  ■  ■  ■  H^K^) 

Briefly,  topological  features  such  as  coverage  holes  at  a  given  time  are  reflected  in  the  dimension  of  the 
respective  homology  space,  and  their  persistence  over  time  may  be  observed  through  the  analysis  of  the  induced 
lineal-  maps.  The  result  of  this  analysis  is  summarized  in  a  barcode,  as  shown  for  example  in  Figure  6,  from 
which  the  statistical  properties  may  be  determined. 

2.2.2  Results 

We  showed  that  zig-zag  persistence  may  be  used  to  perform  the  following  tasks  in  a  coordinate  free  setting: 
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1 .  Comparison  of  mobility  patterns. 

2.  Coordinate  free  estimation  of  hole  size. 

3.  Tracking  individual  coverage  holes. 

Comparison  of  mobility  patterns  The  length  of  the  barcode  signifies  the  time  for  which  a  coverage  hole  in 
the  network  persists  before  being  covered  or  merging  with  some  other  hole.  Therefore,  we  would  expect  that  the 
distribution  of  times  of  which  these  holes  persist  would  vary  with  the  mobility  pattern.  This  was  indeed  the  case, 
and  as  shown  in  Figure  4,  we  were  able  to  distinguish  very  clearly  between  Discrete  Brownian  motion  [17]  and 
Straight  Line  motion  [17,  18]. 


Difference  (DB-SL)  in  LTcounts 


Figure  4:  LTcounts  for  the  Discrete  Brownian  (top  left)  and  Straight  Line  (bottom  left)  mobility  patterns,  as 
well  as  the  paired  difference  in  LTcounts  (right),  with  the  lifetimes  whose  frequency  has  a  statistically  significant 
difference  between  the  groups  highlighted.  The  lifetimes  that  occur  more  frequently  in  the  Discrete  Brownian 
pattern  (t  =  1, 19, 22  and  ,50)  are  highlighted  in  red  in  the  top  plot,  and  those  that  occur  more  frequently  in  the 
Straight  Line  pattern  (t  =  4, . . . ,  13)  are  highlighted  in  green  in  the  bottom  plot. 

Coordinate  free  estimation  of  hole  size  The  barcode  obtained  from  zigzag  persistence  gives  us  a  quantita¬ 
tive  descriptor  for  the  time-varying  coverage  of  a  network.  Flowever,  the  presence  of  a  long  bar  in  the  barcode 
may  or  may  not  geometrically  correspond  to  a  large  hole  .  Given  that  our  network  is  described  as  a  sequence  of 
adjacency  matrices  (describing  the  simplicial  complex  at  each  snapshot,  but  without  coordinate  information),  the 
best  estimate  available  is  the  hop-length  of  the  shortest  cycle  surrounding  a  hole.  This  can  be  obtained  without 
having  to  compute  the  shortest  cycle  explicitly,  by  performing  a  hop-distance  filtration  on  the  simplicial  complex 
(at  each  time  point).  In  other  words,  given  a  cycle  in  the  graph  which  surrounds  a  coverage  hole,  by  successively 
adding  edges  between  nodes  which  are  multiple  hops  away,  and  checking  whether  the  cycle  is  “filled  in”,  we 
may  obtain  an  estimate  of  the  size  of  coverage  hole  by  this  approach.  In  practice,  we  observe  a  strong  correlation 
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between  a  hole  size  estimated  this  way  with  the  actual  geometric  area  as  shown  in  Figure  5. 


(Correlation  0.17979) 


o  0.02  -  j 


Number  of  holes  fl^) 


(Correlation  0.50828)  (Correlation  0.74768) 


Sum  of  hole  sizes  (hop  dist)  Sum  of  squared  hole  sizes  (hop  dist) 


Figure  5:  Relationship  between  coverage  hole  area  (measured  as  proportion  of  total  area)  and  various  homo- 
logical  features.  Left  -  first  betti  number  r  =  0.176),  middle  -  sum  of  hole  sizes  (measured  using  depth  in  hop 
distance  filtration,  r  =  0.505),  right  -  sum  of  squared  hole  sizes  (measured  using  depth  in  hop  distance  filtration, 
r  =  0.747). 

Tracking  individual  coverage  holes  Intuitively,  our  method  aims  to  compute  a  ‘canonical  basis’,  i.e.,  a  set  of 
cycles,  where  there  is  one  representative  cycle  surrounding  each  hole.  Given  the  Rips  complex  for  a  static  sensor 
network,  without  an  embedding  or  geometric  information,  such  a  canonical  basis  is  impossible  to  obtain.  In  the 
time-varying  setting  however,  a  small  amount  of  ‘canonical’  information  is  available:  when  a  coverage  hole  is 
first  formed  by  the  removal  of  a  2-simplex  (triangle),  the  boundary  of  that  triangle  is  known  to  surround  exactly 
the  hole  of  interest.  The  idea  behind  our  method  is  then  to  use  that  boundary  as  the  representative  cycle  for  the 
homology  class  at  its  birth  time,  and  propagate  that  information  forward  through  the  sequence  of  complexes  as 
best  as  possible. 

Figure  6  illustrates  a  network  which  is  initially  fully  covered,  and  has  a  number  of  small  coverage  holes 
appearing  over  time,  one  of  which  is  persistent.  The  barcode  displaying  lifetimes  of  homological  features  can  be 
seen  in  the  top  left,  with  the  bars  color-coded  to  correspond  to  their  associated  representative  cycles  in  the  other 
figures.  It  can  be  seen  that  each  representative  cycle  remains  relatively  tight  around  one  coverage  hole,  and  the 
set  of  cycles  does  correspond  to  a  canonical  basis  at  each  time  point.  Overall,  when  a  network  is  dense  enough 
that  its  coverage  holes  appear  and  disappear  in  an  isolated  fashion  (as  opposed  to  splitting  and  merging  with  other 
holes),  this  method  performs  very  well.  Preliminary  results  have  appeared  in  [19]. 

2.2.3  Summary 

We  have  worked  on  developing  an  analysis  of  various  aspects  of  networks  under  the  effect  of  catastrophic  failures 
such  as  those  caused  by  WMDs.  Starting  with  networks  with  static  nodes  and  static  topology,  we  have  devel¬ 
oped  distributed  and  coordinate  free  algorithms  to  detect  and  localize  failures.  For  networks  with  time-varying 
topology,  we  have  shown  that  the  boundary  update,  and  the  subsequent  tracking  of  a  spreading  failure,  may  be 
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Figure  6:  Representative  cycles  for  the  intervals  obtained  using  zigzag  persistence  in  a  dense  network,  with 
color-coding  between  the  representative  cycle  at  each  time  point  and  its  corresponding  interval  (barcode  -  top 
left). 

performed  using  only  the  edge  length  information  in  real  time.  Building  on  our  success  in  these  scenarios,  we 
further  developed  methodologies  for  quantifying  and  tracking  the  coverage  in  mobile  networks  (where  the  nodes 
themselves  are  also  in  motion). 

This  result  demonstrated  that  the  integration  of  traditional  graph  theory,  distributed  algorithms  and  the  up¬ 
coming  field  of  algebraic  topology  serves  as  an  effective  set  of  tools  for  robust  analysis  of  networks.  The  focus 
of  the  funded  effort  herein,  has  focused  on  sensor  networks,  the  nature  of  which  endows  certain  geometric  con- 
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straints  to  the  construction  of  the  network.  On  the  other  hand,  there  arc  many  important  networks  in  use  which 
do  not  necessarily  follow  these  geometric  constraints  and  these  networks  arc  often  interconnected  and  interde¬ 
pendent.  In  the  future,  we  plan  on  building  on  the  accomplished  work  and  on  what  we  have  learnt  from  it,  to 
develop  algorithms  and  techniques  which  may  address  further  issues  in  cyber-physical  networks,  where  control 
issues  have  to  also  be  carefully  addressed. 


3  Network  Connectivity  and  Vulnerability:  Modeling,  Analysis,  and  Counter¬ 
measure 

3.1  Understanding  the  Impact  of  Multi-Failure  on  Network  Architecture 

The  impact  of  mobile  node  behaviors  in  ad  hoc  networks  has  been  discussed  in  recent  studies.  For  example, 
in  [20]  a  distributed  and  scalable  acceptance  algorithm  called  GTFT  was  proposed  to  enable  nodes  to  decide 
whether  accept  or  reject  relay  requests  in  terms  of  cyber-attacks.  In  [21],  DoS  attacks  launched  by  malicious 
nodes,  Jellyfish  and  Blackhole,  were  shown  to  have  a  network  partitioning  effect  that  degrades  the  network 
performance.  Recall  that  routing  is  the  basic  function  in  a  wireless  ad  hoc  network,  every  node  action  such  as 
node  movements,  misbehavior,  and  attacks  will  ultimately  affect  routing  procedures.  For  example,  node  mobility, 
which  is  the  intrinsic  feature  of  mobile  ad  hoc  networks,  may  incur  the  routing  table  update  and  path  re-selection. 
On  the  other  hand,  routing  operations  may  be  interrupted  or  distorted  via  node  misbehavior:  a  selfish  node  may 
not  forward  packets  for  other  nodes;  a  DoS  attacking  node  may  reorder  or  drop  packets;  and  a  failed  node  may 
not  respond  to  route  discovery  messages.  Thus,  routing  protocols  cannot  be  performed  correctly  and  the  network 
is  under  the  operation  of  communication  malfunction.  Therefore,  we  focus  our  efforts  on  the  study  of  network 
survivability  in  the  presence  of  multiple  failures  that  due  to  node  mobility,  faulty  communication,  and  dead  nodes 
in  the  aftermath  of  attacks. 

3.1.1  Characterization  of  Malfunction  of  Nodes  and  Operation 

Basically,  current  modeling  approaches  and  models  have  two  major  limitations;  that  is,  memoryless  property  of 
each  node’s  status,  and  independence  of  node  failures  under  threats.  First,  most  of  these  models  are  based  on  the 
assumption  of  exponential  distribution  of  individual  events,  e.g.,  the  transition  time  between  two  node  failures 
follows  an  exponential  distribution.  Unfortunately,  failure  events,  often  times,  do  not  follow  exponential  distribu¬ 
tions  in  real-time  distributed  systems  [22,  23,  24].  It  is  hence  unable  to  reflect  the  correlation  of  multiple  failures 
and  their  dependencies  since  a  plain  exponential  distribution  is  memoryless.  Second,  most  of  these  models  can 
describe  single,  independent  failures  only,  e.g.,  an  energy  model  is  used  to  estimate  probability  of  power  outage; 
mobility  models  arc  used  to  estimate  link  lifetime;  traffic  models  arc  used  to  analyze  throughput,  and  so  on.  In 
summary,  existing  modeling  approaches  fail  to  describe  diverse,  correlated,  random  threats  in  the  face  of  WMD 
attacks,  though  they  arc  appropriate  to  understand  basic  performance  issues  of  wireless  networks.  A  large  num¬ 
ber  of  potential  failures  arc  left  out  in  these  models,  including  connection  breakage  due  to  the  combined  effect  of 
mobility,  power  outage,  channel  variability,  incomplete  knowledge  of  the  operation  capabilities  of  other  nodes. 
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and  faulty  network  protocol  behaviors. 


3.1.2  Definition  of  Network  Survivability 

The  survivability  of  a  network  refers  to  the  ability  that  a  network  resists  failures  not  only  due  to  physical  damages, 
operational  errors  or  misconfiguration,  but  also  due  to  adversaries.  In  order  to  enable  a  wireless  ad  hoc  network 
in  operation,  it  is  necessary  to  keep  the  network  connected,  whenever  practical.  Hence,  we  employ  network 
connectivity  as  the  measure  of  the  survivability  for  wireless  ad  hoc  networks.  In  this  work,  we  denote  a  wireless 
ad  hoc  network  by  M.  and  its  node  set  by  M .  For  a  fc-connected  network  M. ,  the  maximum  value  of  k  is 
defined  as  the  connectivity  of  M.,  denoted  by  n(Ml )  (see  graph  connectivity  in  [25]).  Then  we  define  the  network 
survivability  as  follows: 

Definition  1.  The  network  survivability  of  M.,  denoted  by  NS/^ik.  N),  is  defined  as  the  conditional  probability 
that  M  is  k-connected  conditional  on  the  system  size  |  jV|,  that  is 

NSM(k,  N)  =  Pr{n{M)  =  k  |  |A/|  =  N),  (6) 

where  |jV|,  the  cardinality  of  M ,  is  a  random  variable  and  N  is  a  value  of  the  system  size. 

In  this  definition,  network  survivability  is  a  probabilistic  measure  of  network  connectivity,  and  affected  by 
multiple  failures  due  to  various  causes  in  wireless  ad  hoc  networks.  The  challenge  in  solving  this  problem  is  to 
take  the  communication  malfunction  into  account  instead  of  deriving  the  probability  of  node  isolation  as  a  resort 
to  analyze  network  connectivity  by  some  previous  work  [26,  27],  where  a  node  is  isolated  only  because  of  no 
active  neighbors.  Our  approach  toward  this  problem  is  composed  of  three  steps:  (i)  we  study  the  evolution  of 
node  behaviors  with  potential  routing  malfunction,  especially,  we  aim  to  find  the  stochastic  properties  of  node 
behaviors,  ( ii )  we  investigate  the  node  isolation  problem  due  to  the  effect  of  abnormal  routing  operations,  and 
(iii)  based  on  the  understanding  of  node  isolation,  we  derive  the  network  survivability. 

3.1.3  Node  Behavior  Modeling 

Our  modeling  of  multiple  failures  is  based  on  a  rigorous  classification  of  malfunction  of  nodes  and  network 
protocols  as  a  result  of  random  threats.  We  found  that  node  behavior  can  be  modeled  by  a  Semi-Markov  Process 
(SMP)  to  characterize  transient  and  steady  failures  over  time  [28,  29].  We  first  associate  nodes  to  one  of  four 
classes  depending  on  their  status, 

•  Cooperative  state  (C):  a  node  is  said  to  be  cooperative  when  it  can  correctly  perform  all  designed  com¬ 
munication/routing  functions.  This  node  status  is  expected  during  peacetime  without  WMD  attacks  or 
stressors. 

•  Faulty  state  (F):  a  node  is  said  to  be  faulty  if  this  node  cannot  correctly  perform  all  functions,  which  can 
be  a  result  of  power  depletion  and/or  electronic  troubles.  However,  these  nodes  do  not  initiate  or  trigger 
attacks  to  others,  even  though  they  may  cause  cascading  failures,  e.g.,  one  node  cannot  forward  information 
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on  time,  making  its  neighbors  unable  to  correctly  receive  data  and  may  further  make  excessive  requests  so 
as  to  jam  radio  links. 

•  Destructive  state  (D):  a  node  is  said  to  be  destructive  if  it  appeal's  to  be  cooperative  (e.g.,  implement  control 
messages  correctly),  but  in  fact,  it  interrupts  normal  communication  functions,  by  delaying,  reordering  or 
dropping  packets,  or  even  sending  fake  routing  messages  and  other  faulty  network  protocol  behaviors  , 
such  as  launching  denial-of-service  (DoS)  attacks  [30]. 

•  Inactive  state  (I):  a  node  is  said  to  become  inactive  or  failed  if  it  cannot  participate  in  any  communication 
with  others.  This  may  be  a  result  of  a  dead  device,  or  being  isolated  from  others  due  to  a  node  moving  out 
the  transmission  range  of  others,  i.e.,  the  effect  of  node  mobility.  This  type  of  nodes,  however,  would  not 
launch  any  attacks. 

The  nodes’  status  are  thus  described  according  to  their  functions  subject  to  many  random  threats.  Let  S  = 
{C,  F,  I).  I }  be  the  state  space  with  the  following  properties:  (1)  A  node  in  state  i  will  enter  state  j  with  probabil¬ 
ity  pij;  and  (2)  given  that  the  next  state  to  be  entered  is  state  j,  the  transition  time  from  states  i  to  j  (i,  j  6  S)  fol¬ 
lows  a  general  distribution  Fl;](t).  Therefore,  the  node  behavior  model  can  be  better  described  as  a  Semi-Markov 
Process  (SMP),  denoted  by  Z{t),t  >  0,  with  a  matrix  of  transition  functions  Qij(t)  =  ptj  ■  Fij(t)  i,  j  €  S  [28]. 
With  state  space  S,  a  discrete  time  Markov  chain  (DTMC),  denoted  by  Xn.  n  >  0  can  be  constructed  with  tran¬ 
sition  probability  matrix  P  =  (prj ) ,  which  is  the  embedded  Markov  chain  of  the  Semi-Markov  process  Z(t). 

Figure  7  depicts  the  node  behavior  model  defined  above. 


Figure  7:  The  semi-Markov  process  for  node  behavior  evolution. 

•  Stochastic  Properties  of  Node  Behavior  Model. 

In  particular,  we  are  interested  in  the  probability  that  Z(t)  is  in  a  certain  state  i,  i.e.,  F,  =  lim/^-x,  P(Z(t)  = 
i\Z(0)  =  j).  Nevertheless,  the  existence  of  the  limiting  distribution  needs  to  be  verified. 

From  Figure  7,  we  can  see  that  the  embedded  Markov  chain  of  Z(t),  denoted  by  Xn,  has  a  finite  state  space 
5,  and  in  Xn  each  state  can  reach  other  states  within  finite  steps  and  itself  within  one  step.  Thus,  Xn  is 
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irreducible  and  ergodic.  By  Corollary  9-1  (pp.  325)  in  [12],  we  know  that  Z(t)  is  irreducible.  Next,  let  //./? 
denote  the  expected  transition  time  from  state  i  to  j,  since  node  behaviors  change  within  finite  time,  then 
Hij  <  oo  holds  Vi.  j  £  S.  If  let  pn  denote  the  expected  holding  time  in  state  i,  we  have  =  YljesPijbij- 
Thus,  l-b  <  00  holds,  which  implies  that  Z(t)  is  also  positive  recurrent  by  Theorem  9-2  (pp.  325)  in 
[12].  Therefore,  by  Theorem  9-3  (pp.  327)  in  [12],  the  limiting  distribution  can  be  obtained  by: 


Pi  =  t  lim  P(Z(t)=i\Z(0)=j) 

t—*  oo/VjGo 


(7) 


where  tt,  is  the  stationary  probability  of  state  i  of  Xn. 


In  order  to  calculate  tt,  and  ft,  in  (7),  we  must  obtain  transition  probabilities  pV]  and  transition  time  distri¬ 
butions  Fij(t),  which  arc  described  as  follows. 


•  Transition  Probabilities. 

To  determine  pxl  ( x  £  {C,  F,  D}),  we  consider  both  energy  consumption  and  node  mobility  behavior, 
which  arc  characterized  by  an  average  node  lifetime,  Tuje,  and  average  node  residence  time,  Tm.  respec¬ 
tively.  To  determine  pxcj  (x  £  {(7,  5,  F}),  we  assume  that  a  destructive  node  may  choose  ka  out  of  total 
N  nodes  as  victims  with  probability  qa  and  needs  an  average  time  of  Tef j  to  compromise  these  victim 
nodes.  Notice  that  an  inactive  node  is  not  affected  by  actions  from  destructive  nodes,  but  both  faulty  nodes 
and  cooperative  nodes  will  be  affected.  To  determine  pxf  ( x  £  {C.  I).  /}),  we  assume  that  destructive 
and  inactive  nodes  will  not  become  faulty  (or  more  accurately,  they  will  behave  like  normal  nodes,  except 
disruption  of  network  operation).  As  for  cooperative  nodes,  they  arc  assumed  to  turn  off  the  packet  for¬ 
warding  function  if  their  residual  energies  drop  below  1  ///  of  their  initial  energies,  so  that  they  become 
faulty  at  time  Tts  =  yy  • Tufe .  To  determine  pxc  (x  £  {S.  AT,  F}),  we  assume  that  a  cooperation  stimu¬ 
lating  mechanism  such  as  nuglet  counter  [31]  is  used,  if  the  faulty  function  is  due  to  energy  concern.  For 
example,  each  faulty  node  possesses  a  certain  number  of  tokens  TCma x  initially  and  spends  tokens  when 
it  sends  or  receives  packets  for  its  own  benefit.  So  faulty  nodes  must  become  cooperative  if  the  number  of 
remaining  tokens  drops  below  a  threshold  TCthr-  For  simplicity,  we  consider  that  destructive  nodes  cannot 
become  cooperative,  while  it  is  possible  for  an  inactive  node  to  be  repaired  or  recharged  with  an  average 
recovery  time  Trecr.  Consider  that  P  is  a  stochastic  matrix,  we  can  determine  pxx  ( x  £  S )  correspondingly. 


3.1.4  Transition  Time  Distributions 


We  use  two-parameter  Weibull  distribution  from  reliability  engineering  to  define  FXi(t )  as:  Fxi(t )  =  1  — 
exp(— (t//3)a)  (x  £  {C,  S,  M}),  where  a  is  the  slope  parameter,  f3  =  Tnfe/T(  1  +  1/a)  is  the  scale 
parameter,  and  T(-)  is  the  gamma  function.  Frxi(t).  Fjri(t)  and  Fcf(t )  arc  defined  by  Weibull  distribution 
similarly.  In  this  work,  we  assume  that  Fjc(t)  is  a  uniform  distribution  with  the  range  of  [a,  b\.  We  further 
define  Flc(t)  and  Fxx(t)  ( x  £  S )  by  exponential  distributions. 
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Then  the  complete  definitions  of  Fl](t)  are  given  by  (8),  where  W(a,  (3)  denotes  Weibull  distribution 
with  parameter  a  and  (3,  £(X)  denotes  exponential  distribution  with  parameter  A,  U(a.  b)  denotes  uniform 
distribution  with  range  [a,  b]  and  7  =  T(1  +  1/a). 

After  determining  pl}  and  we  obtain  77  by: 

7?  =  7?P,  Y*i  =  l,  7T,;  >  0,  (9) 

i£«S 

where  tt  =  (77)  for  i  G  S.  We  further  obtain  m  by: 

fO O 

Pij  =  I  tdF-ij  (tj ,  fi ,  =  f  '  p-jj pij ,  Vi  <E  S.  (10) 

Jo  jes 

By  substituting  the  results  from  (9)  and  (10)  into  (7),  the  limiting  probability.  Pi,  can  be  obtained. 

3.1.5  Discussions  and  Summary 

Note  that  we  have  the  following  observations  of  the  proposed  nodal  model: 

(i)  This  model  is  able  to  characterize  the  transient  and  steady  behavior  of  wireless  nodes  in  presence  of 
multiple,  interdependent  failures  and  colluding  attacks.  The  reason  for  using  Semi-Markov  Process  (rather  a 
continuous-time  Markov  chain)  is  because  the  sojourn  time  during  which  a  node  behaves  in  any  state  i  e  S  may 
not  follow  the  exponential  distribution.  For  instance,  a  node  is  more  inclined  to  be  in  a  failed  state  due  to  energy 
consumption  as  time  passes,  and  the  less  residual  energy  is  left,  the  more  likely  a  node  changes  its  behavior  to 
defective,  i.e.,  not  forwarding  data  for  others.  The  SMP  allows  for  arbitrary  distributed  sojourn  times  and  can 
be  viewed  as  a  process  with  an  embedded  Markov  chain  (EMC),  denoted  by  {Xn},  where  the  state  transitions 
occur  at  time  instants  when  a  node  changes  its  behavior  to  a  new  state.  In  other  words,  this  model  enables  us  to 
understand  the  evolution  of  node  behaviors  over  time. 

(ii)  This  model  can  be  used  to  describe  a  wide  variety  of  random  threats  as  explained  in  the  definition  of  each 
node  state,  depending  on  how  to  dijfuse  data  into  this  model.  Specifically,  let  tn  be  the  elapsed  time  between  the 
n-th  and  n  +  1-th  transition,  we  can  define  the  associated  ( time-homogeneous )  Semi-Markov  kernel  Q  =  (Qifit)  ) 
by 

Qij{t )  =  Pr{Xn+ 1  =  j,  tn<t\Xn  =  i)  =  pij  ■  Fij(t),  (11) 

where  =  lirn^oc  Qij{t)  =  Pr(Xn+]  =  j\Xn  =  i)  is  the  state  transition  probability  between  states  i  and 
j,  and  Fij(t)  =  Pr(tn  <  t\Xn+\  =  j,Xn  =  i)  is  the  transition  time  distribution  from  states  i  to  j.  The 
distribution  matrix  ¥(t)  =  (Fjjfit))  can  be  determined  by  using  data  diffusion  with  trace  files  or  by  certain 
probability  distribution.  To  the  best  of  our  knowledge,  there  are  no  prior  works  on  this  problem.  Therefore, 
we  plan  to  use  the  well-known  Weibull  distribution  is  used  for  its  wide  application  in  the  area  of  reliability 
engineering  [32,  33,  20].  For  instance,  the  time  distribution  from  cooperative  state  (C)  to  inactive  (I)  Fci(t)  is 
represented  by  by  a  two-parameter  Weibull  distribution  Fci(t)  =  1  —  exp (— (f//3)“),  where  a  is  usually  called 
slope  (or  shape)  parameter  and  (3  is  usually  called  scale  parameter,  which  can  be  adjusted  by  iterative  matching 
with  areal-time  system  [34]. 
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(iii)  This  model  can  be  used  for  complete  and  incomplete  data  traces.  One  of  the  concerns  for  an  analytic 
model  is  whether  it  can  be  used  to  estimate  or  predict  future  behaviors.  More  importantly,  the  model  itself  must 
be  sufficiently  generic  for  data  diffusion,  given  complete  or  incomplete  data. 

•  When  data  traces  arc  complete,  e.g.,  peacetime  training  data  is  available:  With  no  loss  of  generality,  we 
can  assume  that  all  nodes  in  the  network  arc  cooperative  at  the  initial  time,  i.e.,  Pr(Z( 0)  =  c)  =  1.  The 
transient  distributions  of  the  SMP  {Z(t) },  with  state  space  S  and  Semi-Markov  kernel  O  in  (1 1),  satisfy 

=  Pr(Z(t)  =  j\Z(0)  =  i) 

=  (1  -  Hi(t))5ij  +  V]  [  Qii(T)Pij(t-T)dT,  (12) 

les  Jo 

where  Ht(t)  =  Pr(tn  <  t\Xn  =  i)  =  Qij{t)  is  the  sojourn  time  distribution  in  state  i,  and  r)tJ  is 

the  Kroncckcr  d  function  and  defined  by  1  for  i  =  j  and  0  otherwise.  Then  the  transient  distribution  Pcc{t) 
is  of  particular  interest,  since  it  indicates  the  cooperativeness  of  any  node  at  time  t  >  0.  However,  it  is 
normally  difficult  to  derive  Pcc(t)  in  continuous  time  domain  [35].  Nevertheless,  a  numerical  solution  was 
proposed  in  [35]  to  solve  (12)  by  rewriting  the  transient  distributions  in  discrete-time  domain  as  follows, 

m 

Pij(mh )  =  (1  —  Hi{mh))5ij  +  EE  hQu(xh)Pij(mh  —  xh),  (13) 

les  x=i 

where  h  is  the  discretization  step.  In  addition,  Qu(xh)  may  be  further  approximated  by  the  difference 
quotient  as, 

Qu(xh)  =  ^  (Qu{xh)  -  Qu((x  -  1  )/?■))  for  x  >  1,  (14) 

where  Qu(xh)  is  the  empirical  distribution  of  Qu(t).  By  using  this  method,  as  long  as  we  have  a  complete 
set  of  trace  data  that  record  all  state  transitions  and  time  instants  when  transitions  occur,  the  empirical 
function  Qu(xh)  can  be  computed  by  (14),  then  Pl](rrih)  can  be  computed  by  (13).  In  deed,  this  method 
has  already  been  used  in  [36]  to  model  behaviors  of  user  mobility  based  on  a  large-scale  trace  database 
from  peacetime2.  In  addition,  we  can  also  use  peacetime  data  for  training  purpose  and  simulation  threats 
to  test  the  model. 

•  When  data  traces  are  incomplete:  Unfortunately,  to  the  best  of  our  knowledge,  there  is  no  complete  trace 
data  recording  user  behaviors  in  wireless  ad  hoc  networks,  especially  for  WMD  stressors.  Thus,  we  strive 
to  grasp  the  stochastic  properties  of  a  node  malfunction  by  utilizing  any  statistics  available  and  reasonable 
estimations, 

Let  p  be  the  sojourn  time  in  state  i,  Tl}  be  the  transition  time  from  states  i  to  j,  E[-}  be  the  conventional 
notation  for  expectation,  then  we  have  E[T,]  =  /0°°(1  —  Hi{t))dt  and  E[Tt]\  =  f0°°(l  —  Fij(t))dt.  In  our 
preliminary  work  [37],  we  have  proved  the  following: 

:For  example,  trace  files  can  be  obtained  from  CERT  Statistics  at  http://www.cert.org/stats/ 
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Lemma  2.  Given  the  SMP  {Z(t)}  associated  with  the  state  space  S  and  the  transition  probability  matrix 
(TPM),  denoted  by  P  =  (pij),  the  transient  distribution  Pij(t )  converges  to  a  limiting  probability  Pj  as 
t  — >  oo;  further,  Pj  can  be  calculated  by 


Pj  =  lim  Pij(t) 

t — XX) 


KjE[Tj] 


where  it  =<  ttj  >  is  the  stationary  distribution  of  { Xnf. 


(15) 


Note  that  Lemma  2  provides  a  method  to  estimate  the  probability  of  a  node  being  cooperative  without  a 
complete  set  of  trace  data.  It  also  indicates  that  statistically,  as  t  — >  oo,  the  probability  of  a  node  being  in 
a  particular  state  can  be  estimated.  Moreover,  the  limiting  probability  of  a  node  in  state  i  also  implies  the 
portion  of  nodes  in  state  i  of  a  network.  For  example,  a  node  with  80%  probability  of  being  cooperative  in 
a  network  means  that  there  exist  80%  nodes  in  the  network  arc  cooperative,  which  is  a  clear  and  concise 
indication  of  the  network  health. 


(iv)  This  model  allows  us  to  determine  transition  probabilities  and  mean  time  of  each  state  under  conditions 
of  distributed,  random  threats.  By  following  this  model,  we  have  obtained  mobility-induced  failure  probability 
by  using  our  newly  developed  smooth  based  mobility  model  [38,  13].  To  identify  defective  nodes  and  estimate 
their  mean  time  to  failure  as  well  as  failure  probability,  an  energy  threshold-based  method  in  which  the  residual 
energy  of  a  wireless  device  can  be  used  to  decides  whether  it  forwards  data  for  other  nodes  or  not  [39,  40] . 

In  the  following,  we  will  discuss  how  to  use  this  model  to  analyze  network  survivability  when  multiple 
failures  arc  present. 


3.2  Analysis  of  Network  Connectivity 

Recall  that  our  objective  is  to  find  out  the  probability  of  an  ad  hoc  network  keeping  A; -connectivity  in  the  presence 
of  node  failures  and  communication  failures.  Based  on  the  proposed  node  behavior  model  in  Section  3.1.3,  we 
are  ready  to  analyze  the  connectivity  of  multi-hop  networks  stochastically  in  this  section. 


3.2.1  Node  Isolation  due  to  Misbehavior 

We  begin  our  analysis  by  examining  the  effects  of  failures,  which  is  so  called  node  isolation  problem.  Figure  8(a) 
shows  the  scenario  where  all  the  neighbors  of  node  u  arc  faulty  nodes.  In  this  case,  the  number  of  node-disjoint 
outgoing  paths  of  u  is  zero,  where  the  term  outgoing  path  refers  the  path  through  which  a  node  can  communicate 
the  nodes  of  at  least  two-hop  away.  In  the  scenario  shown  in  Figure  8(b),  one  of  the  neighbors  of  node  u  is 
destructive,  e.g.,  Black  Hole.  In  fact,  only  one  Black  Hole  neighbor  is  sufficient  to  trap  all  traffic  initiated 
from  node  u  if  the  destination  is  beyond  the  neighborhood  of  node  u.  In  this  case,  the  number  of  node-disjoint 
outgoing  paths  for  node  u  is  also  zero.  If  node  u  is  surrounded  by  one  or  more  other  destructive  nodes,  then 
the  throughput  of  the  data  stream  via  the  destructive  node  will  become  zero  after  a  short  time  period,  which  is 
especially  harmful  for  long  communication  sessions. 
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Figure  8:  Node  Isolated  by  Non-Cooperative  Neighborhood. 

Let  Nop(u )  denote  the  number  of  node-disjoint  outgoing  paths  of  node  u,  then  node  u  is  isolated  from  the 
network  if  Nop{u)  =  0.  Considering  that  there  exist  only  two  types  of  destructive  nodes  in  this  context,  one 
is  to  trap  all  traffic  and  the  other  one  has  abnormal  routing  functions  (e.g.,  JellyFish,  we  have  the  following 
observation: 

Lemma  3.  A  node  u  is  isolated  if  it  has  at  least  one  Black  Hole  neighbor  or  the  total  number  of  faulty,  JellyFish, 
and  inactive  neighbors  is  d,  given  it  has  d  neighbors. 

By  Lemma  3,  let  D  denote  the  number  of  neighbors  of  a  node,  then  we  obtain  the  probability  of  a  node  being 
isolated,  given  that  the  node  has  d  neighbors,  as 

Pr{Nop  =  0|£>  =  d)  =  1  -  (1  -  PBH)d  +  (1  -  Pc  -  Pbh)\  (16) 

where  Pc  and  Pbh  are  the  probabilities  that  a  node  is  cooperative  and  a  Black  Hole,  respectively.  Consequently, 
a  node  must  have  at  least  one  cooperative  and  no  Black  Hole  neighbor  to  keep  it  connected  to  the  network. 

3.2.2  Condition  of  Keeping  A  Node  /  -Connected 

Let  hc(u),  fifjrr(u)  and  ng{u )  denote  the  number  of  cooperative,  Black  Hole  and  all  other  neighbors  of  node  u, 
respectively,  then  based  on  the  analysis  to  node  isolation  problem,  we  have 

Theorem  1.  A  node  u  has  k  node-disjoint  outgoing  paths  if  and  only  if  u  has  k  cooperative  neighbors  and  no 
Black  Hole  neighbor,  i.e.,  { Nop(u )  =  k}  <F>  {hc(u)  =  k,hBH{u )  =  0 }fork  >  1. 

Notice  that  the  events  of  any  node  being  in  a  certain  behavior  state  are  mutually  independent,  then  by  multi¬ 
nomial  probability  law,  we  know  that  the  joint  distribution  of  fic,nBH^ng  is  a  multinomial  distribution.  By 
Theorem  1,  the  probability  of  a  node  being  fc-connected  to  network,  given  that  the  node  has  d  neighbors,  is 
defined  as 


Pr(Nop  =  k\D  =  d)  =  Pr{nc  =  k,  ubh  =  0,ng  =  d  —  k) 


d\ 


-(Pcf-Pd~k,  k>  1, 


k\(d  —  k)\ 

where  P  =  1  —  Pc  —  PBh  denotes  the  probability  of  a  node  being  neither  cooperative  nor  Black  Hole. 


(17) 
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3.2.3  Probability  of  k- Connectivity  of  Individual  Node 


Let  0(A4a)  =  min{iV0p(«)|Arop(n)  e  N,  u  G  A4a},  we  have  the  condition  to  keep  a  network  ^-connected  as 
follows: 

Theorem  2.  A  multi-hop  network  Ma  with  Na  nodes  is  k-connected  if  and  only  if  any  active  node  u  of  MA  has 
at  least  k  node-disjoint  outgoing  paths,  when  Na  is  sufficiently  large. 

Therefore,  by  Theorem  2,  the  probability  of  a  network  being  k-connected  can  be  represented  by: 

Pr(n(MA )  =  k)  =  Pr(6(MA)  >  k).  (18) 

We  assume  that  the  number  of  outgoing  paths  for  each  node  u,  Nop(u),  is  independent,  then  from  (18),  we  have: 

Pr(K(MA)  =  k\Na)  =  (1  -  Pr(Nop  <  k))N“,  (19) 

where  Na  is  the  number  of  active  nodes.  By  the  total  probability  law,  we  have 

OO 

Pr(Nop  <k)=Y l  Pr{Nop  <  k\D  =  d)Pr(D  =  d).  (20) 

d—k 

To  solve  this  problem,  we  need  to  find  Pr(Nop  < 
given  immediately  by: 

Pr(Nop  <  k\D  = 

=  1  —  (1  —  Pbh)‘ 


k\D  =  d )  and  Pr(D  =  d ).  By  (17),  Pr(Nop  <  k\D  =  d)  is 


d) 


d\ 


k- 1 

-  y 

'  m\(d  —  m)\ 

m—0  v  ! 


m  r>d—m 


( Pc)m  ■  P 


(21) 


To  derive  Pr(D  =  d),  we  assume  that  all  nodes  move  randomly  over  a  finite  area  with  size  A.  We  divide  the 
area  into  N'  small  grids  virtually  so  that  the  grid  size  is  in  the  same  order  of  the  physical  size  of  a  node.  Consider 
that  the  network  area  is  normally  much  larger  than  the  node  physical  size,  that  a  node  occupies  a  specific  grid, 
denoted  by  p',  is  very  small.  With  large  N'  and  small  //,  node  distribution  can  be  modeled  by  a  Poisson  point 
process.  Then  we  have 

Pr(D  =  d)  k  (22) 

a! 

where  po  =  t>7Trw  P  is  the  node  density  depending  on  the  underlying  mobility  model,  and  ro  is  the  transmission 
range  of  nodes. 

Finally,  by  (19),  (20),  (21)  and  (22),  we  obtain: 


Pr(n(MA)  =  k\Na ) 

.  .  r(fc>Fo)  T(k,po(l -PBH))\  Pn prh  T(k,p0Pc)]Na  m 

-  _  r(k)  +e  r(t)  )  r(t)  J  ' 

where  T(-)  and  T(/v,  x)  =  (h  —  1) !e— ^  ^f=0'  xl/l\  arc  complete  and  incomplete  Gamma  function,  respectively. 
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Recall  that  the  network  survivability  has  been  defined  in  (6)  as  the  probability  that  all  active  nodes  arc  Re¬ 
connected  to  a  network.  The  survivability  of  a  network  M.  can  be  given  by  the  probability  that  all  active  nodes 
have  at  least  A  cooperative  degree,  i.e., 


NSk(M )  «  Pr(6(Ma)  >  A ),  (24) 

where  M.a  is  the  sub-network  of  M  induced  by  all  active  nodes.  Based  on  above  equation,  we  arc  ready  to  derive 
the  bounds  for  network  survivability  next. 

3.2.4  Bounds  of  Network  Survivability 

Although  (24)  offers  a  guideline  on  deriving  NSk(M),  it  is  quite  challenging  to  find  the  distribution  of  9(A4a). 
Indeed,  Pr(9(M.a)  >  k)  is  equivalent  to  the  joint  probability  of  every  active  node  being  at  least  A-connected  to 
the  network,  i.e., 

NSk(M)*Pr(  p|  Dc{u)>k).  (25) 

U&Ma 

We  notice  that  it  has  been  shown  that  some  random  graph  models  do  not  generate  the  correlation  of  the  degrees  in 
a  pair  of  adjacent  nodes  [41];  however,  this  non-correlation  does  not  imply  the  independence  of  node  degrees  and 
even  cooperative  degrees.  Considering  that  deriving  the  joint  probability  is  actually  intractable,  we  approximate 
the  survivability  by  finding  its  asymptotic  upper  and  lower  bounds. 

To  provide  an  upper  bound,  recall  that  our  network  model  described  in  Section  3.1.3  is  a  geometric  random 
graph  Q{M ,  r ),  in  which  N  vertexes  arc  uniformly  and  randomly  distributed  on  a  2-D  square  with  area  A.  The 
vertex  set  can  actually  be  represented  by  a  (homogeneous)  Poisson  point  process  TL\  with  density  A  =  N/A. 
Based  on  the  definition  of  (homogeneous)  Poisson  point  process,  the  numbers  of  points  within  disjoint  subareas 
are  mutually  independent  random  variables  (with  identical  distribution).  Thus,  we  can  find  N/(\nr2)  (active) 
points,  denoted  by  A I'd,  so  that  their  transmission  ranges  (nr2)  are  disjoint  (non-overlapped)  subareas  (disks).  As 
a  result,  the  degrees  of  two  nodes  u  and  v  are  mutually  independent  as  u,  v  6  Nn-  Similarly,  Dc(u)  and  Dc(v) 
are  mutually  independent  as  well.  Based  on  the  explanation  above,  we  have  an  upper  bound  for  NSk{M )  given 
by 

NSk(M )  <  Pr(  p|  Dc(u)>k) 

=  (l-Pr{Dc{u)  <  A))^.  (26) 

Thus,  once  we  obtain  the  distribution  function  of  cooperative  degree,  we  can  calculate  the  upper  bound  of  sur¬ 
vivability. 

Next,  we  explain  how  to  obtain  a  lower  bound  for  survivability.  We  first  rewrite  (25)  as 

NSk(M)  «  1  -  Pr(  IJ  Dc(u)  <  A).  (27) 

ueMa 
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Let  Na  denote  the  number  of  active  nodes  in  the  network,  lpci  denote  the  indicator  function,  then  we  can  bound 
Lr(lJuG_v(a  Dc(u)  <  k )  from  above  by  using  Boole’s  inequality, 

Pr(  ^  Dc(u)  <  k^j  =  E  E  Dc{u)<k}\Na 

UGMa 

'Na 

<  E  ^E  [l{Dc(u)<fc}] 

_U=1 

=  E[Na]  ■  Pr(Dc(u)  <  k).  (28) 

Notice  that  the  expected  value  of  Na  is  actually  equal  to  iV(l  —  Pf),  i.e.,  E[Na\  =  N(1  —  Pf),  where  P%  is  the 
(limiting)  probability  of  a  node  in  the  failed  state,  defined  in  (15).  We  obtain  a  lower  bound  for  NSk{M)  as 

NSk(M)  >1-N(1-  Pf)  ■  Pr(Dc(u)  <  k).  (29) 

Again,  to  solve  (29),  we  need  to  determine  Pr(Dc  <  k). 

From  eq  (20),  we  need  to  find  Pr(D  =  d)  and  Pr(Dc  <  k\D  =  d). 

First,  to  derive  Pr(D  =  d),  we  use  the  (de)Poissonization  technique  presented  in  [42,  43].  As  we  mentioned 
previously,  the  communication  graph  of  a  network  A4  is  associated  with  a  homogeneous  Poisson  process  TL\ 
with  density  A  =  N/A.  Since  we  are  particularly  interested  in  the  topological  survivability  of  active  nodes,  let 
A()  =  Trr2  denote  the  area  covered  by  a  node’s  transmission  range,  it  is  known  that  the  number  of  active  nodes 
within  Ao  is  a  Poisson  random  variable  with  density  ,4o  •  ( Na/A ).  Thus,  Pr(D  =  d)  can  be  approximated  by 

Pr(D  =  d)  =  (30) 

where  //  =  7rr2Ar(l  —  Pf)/ A  is  the  Poisson  density.  A  similar  result  was  also  presented  in  [26],  in  which  more 
general  results  were  presented  for  non-uniform  node  distributions. 

Second,  we  derive  Pr(Dc  <  k\D  =  d).  Since  the  cooperative  degree  cannot  be  greater  than  the  degree  for 
any  node,  Pr(Dc  <  k\D  =  d)  is  always  equal  to  1  when  d  <  k.  When  d  >  k,  Pr(Dc  <  k\D  =  d)  {k  >  1)  can 
be  calculated  by 

fc-t 

Pr(Dc  <  k\D  =  d)  =  ^  Pr(Dc  =  m\D  =  d) 

m—  1 

+  Pr(Dc  =  0\D  =  d),  (31) 

in  which  Pr(Dc  =  0| D  =  d)  is  the  node  isolation  probability,  and  Pr{ l)c  =  m\D  =  d)  is  the  probability  of  a 
node  being  /r-cormcctcd.  With  these  two  items,  from  (31),  we  can  re-write  (31)  as 

Pr(Dc  <  k\D  =  d)  =  1  -  (1  -  PB)d 

+  E  (5)  Pc  ’  (1  -  pc  -  PB)d-m.  (32) 

m= 0  '  ' 

Thus,  by  utilizing  (30)  and  (32),  Pr(Dc  <  k)  can  be  obtained  from  (20),  and  the  upper  and  lower  bounds  of 
the  network  survivability  can  be  further  obtained  from  (26)  and  (29).  We  present  our  main  result  next. 
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3.2.5  Main  Result  and  Implications 


Theorem  3.  For  a  wireless  ad  hoc  network  FA  in  the  presence  of  node  misbehavior  and  failures,  when  the  number 
of  nodes  N  is  sufficiently  large,  the  network  survivability  defined  in  (6)  is  upper  bounded  asymptotically  by 

N 

NSk(M )  <  (l  -  r(r(yc)))^  T  (33) 

and  lower  bounded  asymptotically  by 

NSk(M)  >1-N(1-  Pf)  (l  -  e~^PB  (l  -  r(p(/^Pc)))  ,  (34) 

where  pa  =  W(1  —  Pf)/(Xnr2)  and  A  is  the  node  density,  and  F(h)  =  (h  —  1)!  and  F(h,x)  =  (h  — 
l)\e~x  are  ^ie  complete  and  incomplete  Gamma  functions,  respectively. 

The  above  theorem  answers  and  quantifies  the  impact  of  different  node  behaviors  on  survivability  directly. 
From  the  upper  and  lower  bounds  given  in  (33)  and  (33),  respectively,  we  have  the  following  observations  by 
numeric  analysis. 

1.  In  general,  the  survivability  is  increasing  in  the  cooperative  probability  Pc,  which  is  accordant  with  our  in¬ 
tuition.  When  the  network  area  A  is  fixed,  the  higher  the  number  of  nodes  N  is,  the  higher  the  survivability 
is,  due  to  the  increased  density.  While  if  the  density  is  fixed,  increasing  N  will  reduce  the  survivability. 
This  implies  that  it  will  become  more  difficult  to  achieve  the  same  survivability  level  as  a  network  scale 
gets  larger  without  increasing  node  density  substantially. 

2.  Given  two  networks  FA  \  and  FAo  with  the  same  N,  A,  and  Pc,  besides  cooperative  nodes,  suppose  that 
FA\  has  failed  nodes  only  and  FA 2  has  misbehaving  (selfish  and  Jellyfish)  nodes  only,  then  NSk(FA  1)  < 
N  Sk{FA.f)  always  holds.  The  severer  impact  of  node  failures  is  due  to  the  fact  that  node  failures  arc  also 
isolated  from  the  network,  which  reduces  the  density  of  active  nodes  (e.g.,  //,,). 

3.  For  given  N,  Pi,  and  Pc,  both  upper  and  lower  bounds  of  the  survivability  decreases  almost  exponentially 
in  p<iPb ■  An  interesting  observation  is  that  when  Pb  is  not  zero,  a  network  with  higher  density  can  have 
a  lower  survivability.  Recall  that  in  Section  3.2.1  we  have  mentioned  that  a  Blackhole  node  may  mislead 
path  selections  of  its  neighborhood  and  trap  surrounding  traffics,  thus  the  negative  impact  of  Blackhole 
nodes  increases  if  they  arc  located  in  the  area  with  high  density. 

Note  that  in  real  networks  the  nodes  at  the  vicinity  of  the  network  (simulation)  boundary  have  less  (active) 
neighbors  and  thus  become  isolated  easily,  which  is  known  as  the  border  effect.  As  pointed  out  in  [44],  the  border 
effect  is  negligible  in  analysis  if  the  network  area  is  much  larger  than  the  transmission  coverage  area  of  a  single 
node  and  the  node  density  is  not  high.  Since  the  survivability  bounds  given  in  (33)  and  (34)  arc  all  asymptotic 
for  sufficiently  large  N  and  we  arc  particularly  interested  in  large-scale  extended  networks  (with  fixed  density) 
[44],  the  border  effect  is  not  considered  in  our  derivation.  For  further  discussions  on  the  border  effect,  readers 
are  recommended  to  refer  [26,  44]  and  the  references  therein. 
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Remark  1.  It  is  a  premise  that  Pr(Dc  <  k)  <  1/N(1  —  Pf )  should  hold  to  guarantee  a  positive  lower  bound 
given  in  (34);  otherwise,  the  lower  bound  is  zero.  When  Pr(Dc  <  k)  =  o(l/N),  we  have  the  following 
approximation 


1  -  N(1  -  Pf) 


_  e~UaPB 


T(k,paPc)\\ 

m  JJ 


T(k,paPc) 

m 


N(l-Pf) 


(35) 


and  the  left  hand  side  (LHS)  is  always  less  than  the  RHS  in  the  above  equation  as  N  ■  Pr(Dc  <  k)  <C  1.  Since  the 
upper  bound  given  in  (33)  is  quite  loose,  we  conjecture  that  the  RHS  of  (35)  is  a  tight  upper  bound  for  network 
survivability.  Indeed,  if  cooperative  degrees,  Dc(u),  are  assumed  to  be  independent  (as  independent  degrees 
assumed  in  [45]),  the  RHS  of  (35)  becomes  the  closed-form  approximation  for  network  survivability. 


Remark  2.  A  special  case  of  our  result  in  Theorem  3  is  that  all  nodes  are  cooperative  and  node  isolations  are  due 
to  the  lack  of  neighbors  only.  In  this  case,  the  survivability  of  a  network  can  be  simplified  to  (1  —  T(fe,  Xirr2)/(k  — 
l)!)w  by  considering  (34)  and  (35),  which  is  the  exact  probabilistic  k- connectivity  approximation  given  in  [45]. 
This  indicates  that  our  result  provides  a  more  generalized  quantitative  evaluation  on  the  topological  survivability. 
Moreover,  our  result,  especially  the  lower  bound,  is  of  interest  not  only  for  theoretical  analysis  but  also  for 
practical  design  of  survivable  wireless  ad  hoc  networks.  For  example,  if  the  statistics  of  user  behaviors  are 
available,  we  can  use  the  methods  proposed  in  Section  3.1.3  to  estimate  state  probabilities.  Then  given  a  desired 
survivability  preference  (e.g.,  >  0.9),  the  minimum  cooperative  degree  or  the  number  of  nodes  can  be  calculated 
as  theoretical  guidance  to  determine  a  proper  network  deployment  so  that  the  survivability  preference  can  be 
achieved  with  high  probability. 


Up  to  now,  we  have  solved  the  SNM-Problem  by  providing  the  loose  upper  and  tight  lower  bounds  to  ap¬ 
proximate  the  network  survivability  in  closed  forms,  in  which  the  impacts  of  node  misbehavior  and  failures  can 
be  evaluated  directly.  Next,  we  conduct  exhaustive  simulations  to  confirm  our  analytical  result. 


3.2.6  Simulation  Results 

Up  to  now,  we  have  obtained  the  stochastic  properties  of  the  impact  of  node  behaviors  on  network  connectivity.  In 
this  section,  we  evaluate  our  node  behavior  model  and  network  connectivity  of  ad  hoc  networks  by  simulations. 

In  this  work,  we  use  NS2-v2.27  and  MATLAB-v6.5  to  perform  the  simulations.  Unless  specified  otherwise, 
all  simulations  are  performed  in  a  1000  x  1000  m2  square  area,  over  which  200  mobile  nodes  with  transmission 
range  150  m  are  distributed  uniformly.  IEEE  802.1 1  is  used  for  medium  access  control  and  AODV  is  used  as  the 
routing  protocol.  BonnMotion  [46]  is  used  to  generate  Gauss-Markov  modeled  movement  scenarios.  In  order  to 
calculate  the  probability  of  connectivity,  we  collected  the  neighborhood  statistics  of  each  node  per  10  seconds, 
including  the  number  of  neighbors  and  the  behavior  of  each  neighbor.  With  these  information,  the  number  of 
outgoing  paths  of  each  node  can  be  obtained,  then  the  probability  of  fc-connectivity  can  be  calculated. 

•  Probability  of  A  Node  Being  Cooperative 
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Probability  of  being  k-connected 


(a)  Effect  of  Node  Mobility 


(b)  Effect  of  Faulty  Nodes 


(c)  Effect  of  Routing  Malfunction 


Figure  9:  Probability  of  A  Node  Being  Cooperative  Pc. 


As  explained  in  modeling  of  node  behavior,  node  mobility  is  represented  by  the  average  residence  time 
Tin.  The  smaller  Tvn  is,  the  faster  a  node  will  leave  a  network,  cooperative  to  its  neighbors.  As  shown 
in  Figure  9(a),  the  cooperative  probability  Pc  is  proportional  to  T,n  when  Tin  <  T and  remains  a 
constant  of  1  /Tufe  afterward.  From  Figure  9(a),  Pc  is  affected  by  the  initial  energy  Emil  as  well,  i.e.,  a 
node  with  a  higher  Em,i  is  more  likely  to  be  cooperative.  From  the  discussion  of  Stochastic  Properties  of 
Node  Behavior  Model  in  Section  3.1.3,  as  rj  increases,  pcs  keeps  deceasing  until  1  /T /,:je,  which  reflects 
the  fact  that  a  node  is  more  likely  to  be  cooperative  if  it  takes  longer  time  to  become  selfish.  Therefore,  in 
Figure  9(b),  Pc  increases  quickly  at  the  beginning,  then  almost  remains  constant  afterward.  Meanwhile,  we 
can  see  that  a  higher  token  threshold  TCthr  can  increase  Pc  effectively,  which  shows  that  it  is  necessary 
to  use  a  cooperation  stimulating  mechanism  to  mitigate  selfish  behavior.  Moreover,  by  Section  3.1.3, 
the  shorter  Teff  is,  the  more  likely  a  node  is  compromised  to  become  destructive,  which  leads  Pc  in 
proportion  to  Teff,  as  shown  in  Figure  9(c).  If  the  fraction  of  vulnerable  nodes  within  total  nodes,  ka/N, 
is  increased  from  0.01  to  0.05,  then  cooperative  probability  Pc  drops  dramatically  for  the  same  Tf.jj.  Thus, 
we  conclude  that  external  attacks  can  impact  Pc  substantially. 

•  Probability  of  ^-connectivity  We  now  study  how  network  connectivity  is  impacted  by  misbehaving  nodes 


(a)  Effect  of  Cooperative  Probability  Pc  (b)  Effect  of  Inactive  Node  Probability  Pi  (c)  Effect  of  Faulty  Probability  Pf 


Figure  10:  Probability  of  ^’-connectivity:  Effects  of  Node  Behaviors. 

and  node  failures.  Figure  10(a)  shows  the  simulation  results  of  the  probabilities  of  ^-connectivity  against 
Pc  for  k  =  1,2,3, 4,  respectively.  In  this  experiment.  Pi  and  Pbh  are  set  to  0  such  that  we  can  observe 
the  effect  of  P,  clearly.  From  Figure  10(a),  the  probability  of  ^-connectivity  is  inversely  proportional 
to  k  given  constant  Pc,  and  proportional  to  Pc  given  constant  k.  To  obtain  a  higher  ^-connectivity,  it  is 
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Probability  of  being  k-connected 


necessary  to  have  a  higher  Pc. 

In  order  to  see  the  effect  of  probability  of  node  failure  we  set  both  Pf  and  Pd  as  zero  to  eliminate 
the  impact  of  misbehaving  nodes  in  our  simulations.  From  Figure  10(b),  the  probability  of  ^-connectivity 
decreases  very  fast  as  P,  increases.  As  we  expected,  for  a  highly  connected  network,  the  impact  of  P,  is 
more  significant,  e.g.,  the  probability  of  k  =  3-connectivity  drops  to  0.4  even  as  Pf  =  0.2. 

In  the  same  way,  we  obtain  the  results  from  node  selfishness  as  shown  in  Figure  10(c).  Similar  to  that 
in  Figure  10(b),  the  plot  in  this  figure  indicates  that  the  probability  of  /^-connectivity  decreases  as  selfish 
probability  Pf  increases.  Nevertheless,  differing  from  the  results  in  Figure  10(b),  the  probability  of  k- 
connectivity  does  not  change  significantly  when  Pf  is  increased  at  the  beginning,  especially  for  lower  k. 
Notice  that  the  number  of  active  nodes  Na  decreases  as  Pi  increases,  which  makes  the  network  sparser  in 
terms  of  the  decreased  node  density  (e.g.,  p  =  Na/A  if  the  node  distribution  is  uniform).  Therefore,  node 
failures  have  severer  partitioning  effects  than  selfish  nodes. 

Compared  to  the  analytical  results,  the  simulation  results  are  lower  than  analytical  ones,  which  can  be 
explained  by  the  border  effect,  i.e.,  the  nodes  at  the  vicinity  of  the  simulation  boundary  have  less  neighbors 
and  thus  become  isolate  easily.  Therefore,  the  analytical  result  provides  a  upper  bound  for  the  probability 
of  ^-connectivity. 

•  /^-connectivity  Impacted  by  Other  Parameters 


(a)  fc-Connectivity  Impacted  by  Pbh  (b)  fc-Connectivity  Impacted  by  N  (c)  ^-Connectivity  Impacted  by  ro 

Figure  11:  Probability  of  fc-Connectivity:  Effects  of  System  Parameters. 

In  addition  to  node  behaviors,  we  continue  to  evaluate  the  impact  of  other  system  parameters  on  network 
connectivity.  Here  we  look  at  the  effect  of  Black  Hole  with  the  probability  of  Pbh-  By  (23),  Pbh  has 
tremendous  influence  on  the  probability  of  /e-connectivity.  Analytical  results  are  illustrated  in  Figure  11(a) 
from  which  we  can  see  that  Black  Hole  is  the  most  harmful  behavior  since  it  destroys  network  connectivity 
much  severer  than  node  failures  do.  Recall  the  node  isolation  issue  discussed  in  Section  3.2.1,  we  find  that 
a  Black  Hole  actually  can  isolate  all  its  neighbors,  and  its  influential  scope  will  be  extended  when  it  roams 
in  the  network. 

Next,  we  discuss  the  effect  of  system  size  N  on  the  network  connectivity.  In  this  simulation,  the  transmis¬ 
sion  range  ro  is  set  as  100m  to  enlarge  system  size  N.  Figure  1 1(b)  shows  that  the  required  network  size  N 
should  be  enlarged  to  guarantee  the  same  connectivity  when  destructive  or  inactive  nodes  are  in  present. 
To  discuss  the  effect  of  the  node’s  transmission  range  ro  on  the  network  connectivity,  we  change  the  system 
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size  N  to  150  from  200  to  enlarge  the  change  of  vq.  Figure  11(c)  shows  that  the  higher  A; -connectivity  is 
required,  the  larger  ro  is  needed.  Similar  to  the  analysis  to  Figure  11(b),  we  conclude  that  the  required  ro 
has  to  be  increased  to  guarantee  the  same  /.'-connectivity  if  destructive  or  inactive  nodes  exist  in  a  mobile 
ad  hoc  network. 

3.2.7  Summary 

In  this  work,  we  focused  on  the  modeling  and  analysis  of  the  impact  of  node  and  communication  failures  to 
network  connectivity  of  multi-hop  wireless  networks,  which  has  been  rarely  studied  before.  We  first  classified 
node  behaviors  into  four  types:  cooperative,  faulty,  destructive  and  failed,  then  proposed  a  node  behavior  model 
by  employing  a  semi-Markov  process.  In  our  model,  mobile  nodes  change  their  behaviors  according  to  the 
well-defined  transition  probability  matrix  and  transition  time  distribution  matrix.  After  obtaining  the  limiting 
probability  of  a  node  being  in  each  behavior  state,  we  analyzed  the  node  isolation  problem  resulting  from 
misbehaving  neighbor  nodes  and  provided  the  condition  under  which  mobile  nodes  can  be  connected  with  a 
mobile  ad  hoc  network.  In  consequence,  we  obtained  the  close-form  and  upper  bound  of  network  survivability, 
that  is,  the  probability  of  a  network  being  /c-connected. 

3.3  /^-Connectivity  Routing 

In  this  project,  we  target  military  networking  infrastructures  such  as  combat  strategic  systems,  tactical  ad  hoc 
networks,  which  carry  time-sensitive  information  and  demand  for  reliable  and  non-disruptive  communications, 
after  studying  the  analysis  of  network  survivability  in  the  presence  of  multiple  failures  as  a  consequence  of 
node  mobility,  energy  depletion  and  operations,  and  various  nodes  faulty  behaviors.  We  also  derived  network 
survivability  when  communication  operations  are  interrupted.  Through  detailed  analysis,  we  obtain  a  close- 
form  representation  of  probability  of  node  isolation  and  survivability  for  k  -connected  networks.  Based  on  our 
models  and  analysis,  we  aim  to  to  design  a  network  protocol  that  is  robust  against  random  failures  and  routing 
misbehavior. 

3.3.1  Objectives  and  Approaches 

How  to  improve  the  network  performance  or  at  least  maintain  a  graceful  performance  degradation  of  wireless 
multi-hop  networks  in  the  presence  of  misbehaving  nodes  is  an  important  design  issue  and  research  problem. 
Previous  works  on  tackling  misbehaving  nodes  in  wireless  multi-hop  networks  can  be  classified  into  three  cate¬ 
gories:  cryptographic -based  secure  routing  protocols,  incentive-  based  cooperation  stimulating  mechanisms,  and 
multipath-based  reliable  forwarding  schemes.  However,  all  existing  works  cannot  fully  mitigate  the  impact  of 
faulty  nodes  which  may  be  routing-compliant  or  may  not  be  malicious  nodes.  Instead,  they  cannot  operate  prop¬ 
erty  due  to  random  threats,  such  as  dysfunctional  devices,  connection  or  link  failures,  incomplete  knowledge  of 
function  and  trustworthiness  of  other  nodes  caused  by  disruption  as  a  result  of  biological,  chemical,  electromag¬ 
netic  pulses,  or  dysfunctional  devices;  and  faulty  networking  behaviors  due  to  cyber-attacks.  More  importantly, 
none  of  previous  reputation  schemes  has  ever  considered  whether  excluding  faulty  nodes  will  impair  the  net- 
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work  connectivity,  which  is,  however,  a  prerequisite  for  all  communications.  Therefore,  the  routing-compliant 
misbehavior  is  an  open  and  challenging  problem  for  the  design  of  wireless  multi-hop  networks. 

We  approach  the  /.--connectivity  routing  from  the  perspective  of  topology  control  because  the  DTRA  mission 
is  to  provide  strategies  of  robust  network  architecture.  More  specifically,  we  strive  to  design  a  resilient  topology 
over  a  network  such  that  network  operations  on  both  control  and  data  planes  are  distributed  only  between  coop¬ 
erative  neighbors  and  the  network  is  k-connected  with  high  probability  (e.g.,>  0.9).  We  first  define  a  new  metric 
called  resilient  capacity  to  measure  the  maximum  number  of  destructive  nodes  that  a  network  can  sustain  under 
the  constraint  that  the  network  is  k-connected  with  a  certain  probability.  By  analyzing  the  theoretical  bound 
on  resilient  capacity,  we  arc  able  to  know  how  many  destructive  nodes  can  be  excluded  without  impairing  the 
connection  of  the  network  (to  a  certain  level).  Further,  we  show  that  an  optimal  resilient  topology  can  maximize 
the  resilient  capacity  and  satisfy  the  connectivity  requirement  if  all  and  only  cooperative  nodes  arc  included  in 
the  topology  in  which  each  node  has  at  least  k-cooperative  neighbors  and  the  average  degree  scales  with  log(iV), 
where  N  is  the  network  size. 

To  achieve  the  optimal  topology,  we  next  determine  node  behavior  dynamics  by  proposing  a  node  co- 
operativity  measurement  scheme  to  quantify  the  likelihood  of  any  node  being  cooperative.  By  using  the  measure¬ 
ment  scheme,  we  then  design  a  distributed  topology  control  protocol  called  PROACtive  to  enhance  the  network 
resilience  to  routing-compliant  misbehavior.  By  applying  the  PROACtive  protocol,  every  node  is  able  to  select 
more  than  k  cooperative  neighbors  and  exclude  misbehaving  nodes  from  its  neighbor  set  dynamically.  As  a  result, 
the  union  of  all  cooperative  neighbor  sets  forms  a  resilient  topology  which  satisfies  the  connectivity  requirement 
and  maximizes  the  resilient  capacity  (at  the  best  effort). 

Compared  with  other  works,  the  design  of  PROACtive  protocol  has  several  unique  features  and  advantages. 
First,  the  reliable  data  delivery  is  achievable  with  the  assistance  of  the  routing  protocol  since  control  packets  arc 
only  dispersed  among  the  member  nodes  in  the  generated  topology;  second,  our  approach  does  not  involve  new 
security  vulnerabilities  and  can  avoid  the  false  accusation  problem;  third,  our  protocol  is  quite  light-weight  in 
the  computation  and  communication  complexity  and  has  a  bounded  convergence  time.  Finally,  by  implementing 
the  PROACtive  protocol  in  the  network  simulator  tool  ns2,  we  confirm  that  after  applying  routing  protocols 
(e.g.,  AODV)  on  the  topology  generated  by  the  PROACtive  protocol,  the  network  goodput  can  be  improved 
significantly  in  different  network  scenarios.  Further,  the  protocol  is  quite  scalable  due  to  its  low  overhead  and 
fast  convergence,  and  performs  even  good  in  networks  with  high  node  mobility.  Therefore,  the  PROACtive 
protocol  is  a  very  promising  and  feasible  solution  for  wireless  multi-hop  networks  to  enhance  their  resilience  to 
more  sophisticated  node  misbehavior. 

3.3.2  Resilience  Capacity  and  PROActive  Protocol 

Our  contributions  include  the  rigorous  definition  and  analysis  of  resilience  capacity ,  necessary  conditions  for 
probabilistic  k-connectivity,  and  design  of  PROActive  protocol. 

To  define  the  resilient  capacity  of  a  network  rigorously,  let  (Q,  -P .  Pr)  be  the  probability  space  on  which 
the  random  link  connection  of  mobile  nodes  is  defined.  In  particular,  Q  is  the  sample  space  consisting  of  all  the 
possible  topology  C  of  a  network,  -P  is  a  cr-field  in  0  and  Pr  is  a  probability  measure  on  .P.  From  graph  theory. 
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we  know  that  the  connectivity  of  a  graph  G,  denoted  by  k(G),  is  the  maximum  k  such  that  G  is  A:-connccted 
[25].  Due  to  node  mobility  and  random  node  behavior,  the  topology  of  a  given  wireless  multi-hop  network  is 
dynamically  changing  at  all  times,  which  results  in  the  variant  connectivity.  Thus,  the  connectivity  of  a  dynamic 
network  can  be  treated  as  a  random  variable  defined  on  Q  and  the  probabilistic  ^-connectivity  of  a  network  can 
be  defined  by  Pr({G  £  Q  :  k(G)  =  A;}),  or  simply  Pr(n(G)  =  k ),  for  G  as  the  geometric  random  graph  model 
of  the  network.  With  above  notations,  we  define  the  resilient  capacity  as  below 

Definition  2.  Given  a  wireless  multi-hop  network,  let  its  topology  be  represented  by  a  geometric  random  graph 
GM,r(Nm)>  where  iV°  is  the  initial  number  of  destructive  nodes  in  the  graph.  For  a  connectivity  preference 
0  <  -00  <  L  the  resilient  capacity  of  the  (topology)  graph  is  defined  by 

A(V>0,  G)  4  max{JV*  -  N°m ,  0}  (36) 

where  i\A  =  ma x{Nm  :  Pr(K(G^)r(Nm))  =  k)  >  V>o}. 


Figure  12:  The  relation  between  the  resilience  capacity  and  probabilistic  /c- connectivity. 

The  physical  meaning  of  the  resilient  capacity  is  the  maximum  number  of  extra  destructive  nodes  that  a 
topology  can  accommodate  such  that  the  topology  is  still  k-connected  with  a  certain  probability,  as  illustrated  in 
Figure  12.  If  N*m  <  iV°  ,  we  define  A(fo,  G)  as  0,  implying  that  no  cooperative  node  can  become  destructive 
any  more  without  degrading  the  probabilistic  /.’-connectivity  below  a  certain  level.  In  fact,  the  resilient  capacity 
also  measure  the  maximum  number  of  extra  misbehaving  nodes  that  can  be  excluded  from  the  topology. 

However,  it  is  worthy  of  noting  that  whether  a  node  can  establish  reliable  connections  to  other  nodes  depends 
on  whether  the  node  has  cooperative  adjacent  nodes  that  operate  normally  on  both  control  and  data  planes. 
Let  Dc(ui)  be  the  number  of  cooperative  adjacent  nodes  of  node  ui,  called  the  cooperative  degree  of  ui,  we 
define  0(Gjytr(Nm))  (or  simply  0(G))  as  the  minimum  cooperative  degree  of  a  graph  G\r/r(Nm),  i.e.,  0(G)  = 
min{Z2c(a;),  Vcu  £  G}.  As  a  result,  we  have 

Proposition  1.  For  a  wireless  multi-hop  network  represented  by  G\j\r(Nrn),  let  //  be  the  average  number  of 
nodes  in  a  node’s  transmission  range.  Suppose  N  3>  1,  then  for  any  positive  integer  k  >  1, 

Pr(n(G)  =  k)  «  Pr(0(G)  >  k) 

>  l-iv(r<t,^~F,n)))  ,  (37) 

where  T(h)  =  (h— 1)1  and  T(h,x)  =  (h— l)\e~x  xi/i\  are  the  complete  and  incomplete  Gamma  functions, 

respectively. 
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Remark  3.  Proposition  1  implies  that  the  necessary  condition  for  a  network  to  be  k-connected  is  that  every  node 
should  have  at  least  k  cooperative  adjacent  nodes.  Thus,  (37)  provides  us  a  useful  tool  to  design  a  k-connected 
topology  w.h.p.  in  a  localized  and  distributed  algorithm.  In  addition,  from  (37),  we  know  that  Pr(n(G)  =  k) 
is  a  decreasing  function  in  Pm,  which  implies  that  the  more  misbehaving  nodes  a  network  has,  the  harder  for 
the  network  to  keep  its  topology  k-connected  w.h.p..  By  using  this  result,  we  are  able  to  determine  the  resilient 
capacity,  presented  next. 

Proposition  2.  Given  a  network  modeled  by  G_\r,r(Nf),  let  Afc  be  the  set  of  cooperative  nodes  in  G  with  Nc  = 
|A/"C|.  Let  the  topology  containing  all  cooperative  nodes  be  denoted  by  Gjf  r(0),  and  let  G'j y,  r{N'm )  denote  any 
topology  containing  both  cooperative  and  misbehaving  nodes  for  0  <  N'm  <  and  |A/’/|  =  Nc  +  N'm.  Then 
A(V’O)  G~)  >  A(ipo,  G')  holds  for  any  0  <  if>o  <  1  and  k  =  1. 

Remark  4.  The  result  of  Proposition  2  implies  that  the  resilient  capacity  can  be  maximized  when  the  generated 
topology  contains  only  and  all  cooperative  nodes  of  the  original  network.  The  above  analysis  also  provides  a 
new  insight  on  the  trade-off  between  eliminating  destructive  and  tolerating  destructive  in  that  the  more  existing 
destructive  nodes  can  be  deleted  from  the  topology,  the  more  new  constructive  nodes  can  be  accommodated. 
Nevertheless,  under  the  constraint  of  connectivity,  the  maximum  number  of  sustainable  misbehaving  nodes  is 
upper  bounded  such  that  not  all  existing  misbehaving  nodes  can  be  eliminated  if  the  resilient  capacity  is  already 
zero. 

With  above  understanding,  it  is  clear  that  theoretically  it  is  possible  to  design  an  optimal  routing  protocol 
to  achieve  the  maximum  resilience  capacity  under  connectivity  constraints.  The  details  of  our  protocol  will 
not  be  discussed  here.  Instead,  we  highlight  our  results  and  observation  here.  To  evaluate  the  performance  of 
our  solution,  we  implement  the  PROACtive  protocol  in  the  simulation  tool  ns2  and  make  three  modifications 
to  the  existing  AODV  module.  First,  the  promiscuous  mode  is  supported  such  that  every  node  can  measure 
others’  cooperativities;  second,  RREQ  and  RREP  messages  arc  distributed  only  among  neighbors  such  that 
path  selections  arc  controlled  within  the  topology  generated;  third,  the  destructive  effect  is  introduced  by  letting 
nodes  configurable  to  drop  data  packets  to  be  forwarded  randomly. 

Figure  13(a)  shows  a  network  without  applying  any  topology  control.  Figure  13(b)  shows  the  topology 
generated  by  the  PROACtive  protocol,  in  which  cooperative  and  destructive  nodes  arc  represented  by  solid  dots 
and  circles,  respectively.  From  the  figure,  we  can  see  that  the  topology  excludes  most  of  destructive  nodes,  while 
keeping  most  of  cooperative  nodes  connected.  To  highlight  the  difference  from  other  topology  control  protocols, 
the  topology  generated  by  the  K-Neigh  protocol  (Phase  1  only,  with  K  =  9)  [47]  is  shown  in  Figure  13(c).  It  is 
no  wonder  that  all  misbehaving  nodes  arc  included  in  Figure  13(c)  because  the  neighbor  selection  in  K-Neigh  is 
only  based  on  the  distance  between  nodes. 

To  test  connectivity,  we  use  the  DFS  (depth-first-search)  algorithm  to  calculate  the  maximum  number  of  nodes 
can  be  removed  without  partitioning  the  network.  Then  the  probabilistic  /c-connectivity  is  calculated  by  the  ratio 
between  the  number  of  /r-conncctcd  topologies  and  that  of  all  topologies  tested.  In  Figure  14(a),  we  can  see  that 
the  Ar- connectivity  probabilities  of  generated  topologies  keep  beyond  0.9  when  N  >  700  for  both  Pm  =  10% 
and  Pm  =  40%  when  original  networks  arc  almost  physically  /r-conncctcd.  Nevertheless,  when  N  <  500,  the  A’- 
conncctivity  probabilities  for  generated  topologies  and  original  networks  decrease  dramatically  even  down  to  0. 
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(a)  No  topology  control  (b)  Topology  generated  by  PROACtive  (c)  K-Neigh  with  K  =  9  (Phase  I) 

Figure  13:  The  illustration  of  the  topology  generated,  compared  with  the  original  network  (circles:  misbehaving 
nodes,  dots:  cooperative  nodes). 


Further,  we  observe  that  the  ^-connectivity  can  hardly  be  preserved  if  Prn  is  too  high,  e.g.,  Pm  >  30%,  as  shown 
in  Figure  14(b).  Another  observation  is  that  the  average  (node)  degree  is  reduced  considerably  in  the  generated 
topologies,  as  shown  in  Figure  14(c),  and  it  is  decreasing  slightly  in  the  misbehaving  ratio  Pm  because  of  the  less 
chance  to  find  enough  cooperative  neighbors.  Recall  that  p  =  0(log  N)  is  a  condition  for  connectivity,  from 
Figure  14(c)  we  can  see  that  the  average  degree  of  generated  topologies  is  asymptotically  greater  than  logiV 
(given  the  original  networks  ^-connected),  which  satisfies  the  second  condition  of  the  connectivity  constraint  and 
also  implies  the  effectiveness  of  our  neighbor  cooperativeness  threshold. 


(a)  ^-connectivity  vs.  N 


(b)  fc-connectivity  vs.  Pm 


(c)  Average  degree  vs.  N 


Figure  14:  The  preservation  of  /c-connectivity  in  the  generated  topology. 


3.3.3  Summary 

The  main  objective  of  this  study  is  to  find  recovery  strategies,  or  more  importantly,  to  proactively  deliver  data 
in  a  timely  manner.  Therefore,  our  definition  of  resilience  capacity  is  the  corner-stone  toward  such  an  objec¬ 
tive  because  we  need  a  rigorously  defined  metric  for  optimal  design.  Our  results  have  demonstrated  following 
features: 
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•  Localized  and  distributed  algorithm :  This  protocol  is  fully  distributed  and  it  requires  local  topology  for¬ 
mation  only. 

•  Preservation  of  k-connectivity :  This  protocol  can  preserve  the  connectivity  of  generated  topology  w.h.p. 
(>  0.9)  if  the  underlying  network  is  physically  ^-connected. 

•  Acceptable  false  positive  ( negative )  ratio:  This  protocol  can  avoid  the  case  of  the  exclusion  of  cooperative 
nodes,  called  false  positive ,  and  the  case  of  the  inclusion  of  destructive  nodes,  called  false  negative. 

•  Low  overhead  and  fast  convergence:  This  protocol  is  light-weight  in  the  computation  and  communication 
complexity  and  have  a  bounded  convergence  time. 

•  Interoperability  with  routing  protocols:  This  protocol  is  interoperable  with  routing  protocols  to  provide  a 
graceful  performance  degradation. 

3.4  Network  Devolution  Under  Random  Failures 
3.4.1  Objectives  and  Approaches 

A  prerequisite  for  any  communications  in  multi-hop  networks  is  that  the  underlying  topology  should  be  con¬ 
nected.  Although  many  works  have  been  done  to  provide  theoretical  guidance  to  achieve  an  asymptotic  full 
connectivity,  this  full  connectivity  is  impractical  to  achieve  all  the  time  when  wireless  multi-hop  networks  scale 
larger  and  larger.  Thus,  we  find  the  fraction  of  nodes  in  the  largest  connected  component  (giant  component)  to  be 
a  better  metric  to  evaluate  the  resilience  of  a  large  network  to  failures,  that  is,  the  larger  the  giant  component  is  the 
more  resilient  a  network  shall  be.  Consequently,  understanding  the  network  devolution  process  when  the  number 
of  node  failures  increases,  especially  the  critical  time  when  the  network  experiences  topological  transitions,  is  of 
importance  in  both  theory  and  practice. 

In  this  subtopic,  we  focus  on  the  following  question:  for  a  large-scale  wireless  multi-hop  network  in  the 
presence  of  random  failures,  when  does  the  network  change  its  behavior  from  an  almost  connected  phase  to  a 
fully  partitioned  phase?  Here  a  network  is  said  to  be  almost-connected  if  there  exists  a  giant  component  that  is 
composed  most  of  surviving  nodes  with  high  probability;  while  a  network  is  fully-partitioned  if  no  such  a  giant 
component  exists  asymptotically  almost  surely.  To  tackle  this  problem ,  we  couple  a  network  devolution  process 
with  a  continuum  percolation  process  in  a  geometric  random  graph  with  uniform  node  distribution.  By  using  the 
concept  of  percolation  probability,  we  first  define  two  metrics,  the  last  connection  time  and  first  partition  time,  to 
characterize  the  critical  phase  transition  time.  The  former  is  the  last  time  at  which  a  network  is  almost  connected 
and  the  latter  is  the  first  time  at  which  a  network  is  fully  partitioned.  Then  we  analyze  the  conditions  under  which 
a  geometric  random  graph  does  (not)  have  a  giant  component  of  surviving  nodes. 

Our  approach  takes  following  procedures.  We  first  map  the  percolation  process  defined  on  the  continuous 
plane  onto  a  discrete  lattice,  whose  edges  arc  declared  open  if  certain  properties  of  the  Poisson  process  in  their 
vicinity  arc  met  ( closed  otherwise).  In  the  discrete  lattice,  we  then  investigate  the  condition  when  infinite  open 
paths  (composed  of  open  edges)  exist  with  positive  probability.  With  a  careful  definition  on  the  open  edge  in 
the  lattice,  a  reverse  mapping  can  be  carried  out  back  to  the  continuous  plane  so  that  infinite  open  paths  on 
the  discrete  plane  indicate  connected  components  on  the  continuous  plane.  Finally,  we  obtain  the  continuum 
percolation  conditions,  which  enable  us  to  derive  the  bounds  of  critical  phase  transition  time,  i.e.,  tc(n)  and 
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tp(n),  for  given  survival  functions. 

Main  Results:  To  the  best  of  our  knowledge,  this  is  one  of  the  first  studies  on  network  evolution  problem.  There¬ 
fore,  our  definitions  and  formulation  of  problems  have  laid  a  good  foundation  for  future  research. 

In  order  to  understand  the  critical  phase  transition  time  during  network  devolution,  we  first  define  almost 
connected  and  fully  partitioned  networks  as  follows 

Definition  3.  Let  G(7i\0jSn,rn)  be  a  geometric  random  graph,  in  which  every  point  is  associated  with  the  same 
survival  function  S(t).  Let  Ai(t)  =  Ao  S(t),  then  the  network  represented  by  G{TL\0^n,rn)  is  said  to  be  almost 
connected  (fPoo(Ai(f))  >  0,  and  fully  partitioned  z/Poo(Ai(t))  =  0,  where  Poo(')  is  the  percolation  probability. 

Next  we  define  two  new  metrics  called  the  last  connection  time  and  first  partition  time. 

Definition  4.  With  the  same  conditions  given  in  Definition  3,  the  last  connection  time  is  defined  by 

tc(n )  =  sup{f  >  0  :  Poo(Ai(t))  >  0},  (38) 

where  X\  it)  =  Aq Sit).  The  first  partition  time  is  defined  by 

fp(n)  =  inf{t  >  0  :  Poo(Ai(f))  =  0}.  (39) 

Definition  5.  The  critical  phase  transition  time,  denoted  by  Tc,  is  the  critical  time  point  above  which  G(Tl\0}Sn ,  rn ) 
is  disconnected  a.a.s.  (sub-critical)  and  below  which  G(7i\0^Sn,rn)  is  connected  a.a.s.,  ( supercritical ),  that  is 

lim  Pr(G  is  connected)  =  <  <  C'  (40) 

rwoc  \  0,  ift  >  Tc. 

The  exact  value  of  Tc  is  unknown,  but  it  is  expected  to  be  bounded  by  tc(n)  and  tp(n)  from  below  and  above, 
respectively,  based  on  our  definitions  on  tc(n)  and  tp(n). 

3.4.2  Network  Partion  Problem  and  Theoretical  Limits 

Now  we  formulate  the  problem  addressed  in  this  paper  as  the  Network  Partition  Time  (NPT)  problem. 

Definition  6.  (NPT  problem);  For  a  large-scale  wireless  multi-hop  network  represented  by  a  geometric  random 
graph  G(7i\0jSn,rn),  every  node  is  assumed  to  be  independently  associated  with  a  common  survival  function 
S(t).  Given  the  network  is  fully  connected  at  initial  time  t  =  0,  find  out 

1.  the  relations  among  n,  Ao,  rn,  and  S(t )  that  would  be  sufficient  to  guarantee  that  the  network  is  almost 
connected  or  fully  partitioned,  respectively; 

2.  the  upper  limit  oftc(n )  and  the  lower  limit  oftp(n),  such  that  the  critical  phase  transition  time  Tc  can  be 
bounded  by  these  limits. 
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The  results  of  this  problem  reveal  that  when  time  t  <  tc(n),  the  network  is  guaranteed  to  be  almost  connected 
(super-critical);  while  t  >  tp(n)  will  be  sufficient  for  the  network  to  be  fully  partitioned  (sub-critical).  We  expect 
the  bounds  of  critical  phase  transition  time  (i.e.,  tc{n),  tp(n ))  to  be  tight  so  that  the  phase  transition  is  sharp 
and  the  period  of  phase  transition  (i.e.,  the  gap  between  tc(n)  and  tp(n ))  converges  to  0  as  fast  as  possible. 
Nevertheless,  in  practical,  a  longer  period  of  phase  transition  might  be  preferable  to  provide  a  smooth  degradation 
of  connectivity. 

Our  theoretical  results  can  be  summarized  as  follows: 

Theorem  4.  Given  a  graph  G(TL\0:Sn,rn),  assume  po  =  Xoitr^  =  0(logn)  and  the  degree  bound  K  =  (1  + 
en)p o  where  en  is  an  arbitrary  increasing  function  of  n.  There  exists  a  positive  constant  ce,  such  that  if  the 
survived  function  S(t)  satisfies, 

(41) 

where  fn  =  1  —  exp(—  2c% 1,1 "),  then  G(Ti\0  Sn,  rn )  is  in  the  super-critical  phase. 

^n/^0  5 

Theorem  5.  Given  a  graph  G(H\0tSn,rn),  assume  po  =  @(logn)  and  K  =  (1  +  en)po.  There  exists  a  positive 
constant  ce,  such  that  if  the  survival  function  S(t)  satisfies: 


cernV  Aq  in  n 


where  an^  T(x,y)  is  the  incomplete  Gamma  function,  then  G(H.\0tSn,rn )  is  in  the  sub-critical 

phase. 

Remark  5.  The  assumptions  on  /to  and  K  arc  needed  to  achieve  an  initial  fully  connectivity,  which  is  the 
condition  of  the  NPT  problem.  Further,  they  also  guarantee  fx  >  ■>  such  that  ln(  \/ftFx  —  1)  is  a  real  number. 
In  fact,  the  format  of  K  indicates  that  the  impact  of  interference  has  to  be  sufficiently  small  so  that  the  degree 
bound  could  be  large  enough  to  support  the  percolation  as  n  — >  oo,  which  is  accordant  with  the  result  proved  in 
[48], 


S(t)> 


v/5(ln  18  —  ln(l  —  15^n)) 
c€rnV  Ao  In  n 


Next,  the  following  corollaries  answer  the  second  part  of  the  NPT  problem,  providing  the  theoretical  bounds 
on  the  critical  phase  transition  time. 

Corollary  1.  (Limits  of  tc(n)  and  tp(n)  with  light-tailed  S(t)):  Assume  the  survival  function  S(t)  =  e~at  (ex¬ 
ponential),  where  the  positive  1/a  represents  the  mean  lifetime  of  a  node,  then  the  upper  limit  of  last  connection 
time  tc(n )  is, 

tc(n )  =  —  ln(lnn)  +  c\  ~  @(log(logn)),  (43) 

a 

where  c\  =  4(ln(Cey4L)  _  ln(v/51n  yxiix^))  am d  c  —  The  lower  limit  of  first  partition  time  tp(n )  is, 

tp(n )  =  —  In (Inn)  +  c2  ~  0(log(logn)),  (44) 

a 


where  c2  =  ^(ln (ceV/f)  -  ln(ln 
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Corollary  2.  (Limits  of  tc(n)  and  tp(n)  with  heavy-tailed  5(f)):  Assume  S(t)  =  (t/rj)  p  (heavy-tailed  Pareto, 
p  >  1)  with  mean  then  the  upper  limit  oftc(n)  is, 

tc(n)  =  c3(lnn)1/p  ~  0((logn)1/p),  (45) 

where  c3  =  and  c  ~  EW  The  lower  limit  °/^(n)  is 

tp(n )  =  C4(lnn)1/,p  ~  @((logn)1//p),  (46) 

where  ca  =  vi _  _ _ U/p 

Remark  6.  A  premise  used  in  above  theorems  is  po  =  c  In  n  =  0(logn)  where  c  is  some  constant  such  that  the 
network  is  fully  connected  initially.  In  particular,  Xue  and  Kumar  proved  in  [49]  that  5.1774 log n  is  required 
for  a.a.s.  connectivity  and  this  threshold  was  further  improved  by  Balister  et  al  in  [50]  to  0.5139  log  n  (also  see 
[51]).  However,  in  our  simulations,  we  find  that  0.5139  log  n  is  far  less  sufficient  to  achieve  an  initial  connected 
random  topology  and  actually  5.1774  log  n  is  a  “good”  threshold  for  connectivity. 

Remark  7.  From  reliability  engineering,  we  know  that  many  lifetime  distributions  (e.g.,  exponential,  log-normal, 
Pareto,  Weibull)  arc  either  light-tailed  or  heavy-tailed  according  to  the  decay  speed  of  their  tails.  Since  the 
exponential  distribution  is  the  only  distribution  to  have  a  constant  failure  rate  and  applies  naturally  to  model 
memoryless  lifetime,  it  is  used  to  represent  light-tailed  survival  functions;  while  the  Pareto  distribution  is  used  to 
represent  heavy-tailed  survival  functions  when  node  lifetime  is  power  law  or  with  very  large  variance. 

Remark  8.  A  network  with  Gaussian  node  distribution  is  more  resilient  to  random  failures,  in  terms  of  a  graceful 
degradation  on  the  relative  giant  component  size  and  a  longer  network  survival  time,  compared  with  the  coun¬ 
terpart  with  uniform  node  distribution.  This  finding  implies  that  the  node  distribution  can  be  also  an  important 
factor  in  determining  the  overall  network  resilience  to  random  failures. 

3.4.3  Simulation  Results:  Visible  Network  Devolution 

To  emulate  the  devolution  process,  nodes  fail  one  by  one  in  the  increasing  sequence  of  their  lifetimes.  Upon  node 
failures,  we  use  a  depth-first  search  (DFS)  algorithm  to  record  all  components  induced  by  surviving  nodes  and 
calculate  the  giant  component  size  S  (i.e.,  the  number  of  surviving  nodes  in  the  largest  component).  The  relative 
giant  component  size  is  defined  by  Sr  =  S/n'  where  n!  is  the  number  of  remaining  surviving  nodes,  in  order  to 
characterize  the  phase  transition  phenomenon. 

Figure  15  illustrates  an  example  of  the  topological  devolution  process  of  a  graph  of  1000  nodes,  where  solid 
dots  and  circles  represent  surviving  nodes  and  failed  nodes,  respectively.  The  survival  function  is  Pareto  with 
parameters  set  above.  By  using  (45)  and  (46)  and  choosing  ce  =  2.5,  we  have  tc(n)  =  733.8  and  tp(n)  =  2041. 
As  expected.  When  t  <  tc(n),  the  topology  constructed  by  remaining  nodes  is  almost  connected  with  a  single 
giant  component,  as  shown  in  Figure  15(a).  On  the  contrary,  when  t  >  tp(n),  the  network  is  fully  partitioned 
and  has  only  several  small  components,  shown  in  Figure  15(c).  Figure  15(b)  shows  the  topology  in  the  period  of 
phase  transition,  i.e.,  tc(n)  <  t  <  tp(n),  where  the  network  arc  disconnected  into  parts  but  with  a  component 
larger  than  others. 
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(a)  t  =  729.5  <  tc(n)  (b)  tc(n)  <t=  1204.4  <  tp(n)  (c)  t  =  2193.4  >  tp(n ) 

Figure  15:  Snapshots  of  the  graph  devolution  process  at  different  times. 


Figure  17:  The  phase  transition  and  critical  time  bounds  with  Pareto  survival  functions  ( p  =  2.0,  rj  =  500.0). 


Figures  16  and  17  show  clearly  how  the  relative  giant  component  size  (Sr  =  S/n ')  decreases  when  the 
network  experiences  increasing  random  failures.  We  summarize  our  observations  as  follows.  First,  the  period 
of  phase  transition  is  bounded  by  the  theoretical  limits  of  tc(n)  and  tp(n)  in  all  simulation  scenarios,  which 
confirms  the  correctness  of  our  analytical  results.  Second,  as  expected,  the  larger  the  network  size  n  is,  the 
sharper  the  phase  transition  is,  which  is  true  for  both  light-tailed  and  heavy-tailed  survival  functions.  Third, 
compared  with  the  actual  value  of  the  giant  component  size  S  (or  the  ratio  S/n),  it  is  clear  that  the  relative 
giant  component  size,  i.e.,  Sr  =  S/n' ,  is  a  more  appropriate  metric  to  indicate  the  phase  transition  phenomenon 
in  the  devolution  process  due  to  random  failures.  Finally,  a  surprising  observation  is  that  given  the  same  n 
and  average  node  lifetime,  the  network  with  Pareto  survival  function  decomposes  substantially  faster  than  the 


44 


network  with  exponential  survival  function.  To  explain  this  phenomenon,  it  is  noticed  that  the  variance  of  Pareto- 
distributed  lifetimes  is  much  larger  than  that  of  exponential-distributed  lifetimes.  This  implies  that  the  majority 
of  Pareto-distributed  lifetimes  have  to  be  short  enough  to  compensate  a  small  number  of  huge  lifetimes,  in  order 
to  achieve  the  same  average  lifetime  with  exponential-distributed  lifetimes.  Consequently,  more  nodes  with 
Pareto-distributed  lifetimes  fail  earlier  than  the  nodes  with  exponential-distributed  lifetimes. 


4  Correlated  failures  and  their  propagation  in  inhomogeneous  networks 

4.1  Failure  Propagation  via  Multi-Hop  Communications 

During  the  course  of  studying  fundamental  limits  of  network  responses  to  attacks,  we  found  our  understanding 
of  network  architecture  is  very  limited  for  which  we  hardly  find  any  literature  on  the  topic.  For  instance,  once 
a  failure  is  detected  or  occurred,  how  fast  can  such  failures  be  known  (by  one  or  more  nodes)  in  the  network? 
The  problem  is  important  in  two-fold:  first  it  is  critical  the  design  of  recovery  strategies  because  a  network 
should  be  designed  sufficiently  robust  against  the  impact  of  such  failure;  second,  the  network  (and  protocols) 
needs  to  be  designed  intelligently  such  that  the  information  of  failure  can  be  made  available  to  as  many  nodes  as 
possible  within  the  shortest  time  period.  Therefore,  a  fundamental  problem  is:  what  is  the  speed  of  information 
propagation? 

4.1.1  Objectives  and  Approaches 

In  the  pioneering  paper  [52],  Zheng  shows  that  there  is  a  constant  upper  bound  W  on  the  information  diffusion 
rate  and  the  network  is  able  to  achieve  a  constant  diffusion  rate,  regardless  of  the  network  population,  in  both 
the  extended  and  the  dense  networks.  Achieving  W  requires  three  conditions:  i)  every  node  uses  an  optimal 
transmission  radius  R,  ii)  the  transportation  distance  of  a  packet  is  a  multiple  of  R,  and  iii)  the  relay  nodes  arc 
aligned  with  separation  distance  R.  Lacking  any  of  these  conditions  results  in  W  unreachable. 

However,  a  few  interesting  questions  remain  unanswered  yet.  First,  if  the  packet  transportation  distance  is 
known  and  not  equal  to  a  multiple  of  R,  what  is  the  best  propagation  strategy  for  the  packet  to  achieve  the  fastest 
delivery?  Since  W  is  unreachable,  is  there  a  tighter  speed  upper  bound?  Second,  when  delivering  a  packet, 
we  care  about  both  how  fast  and  how  well  the  packet  is  delivered,  that  is,  whether  all  the  intended  recipients 
can  receive  the  packet  successfully.  When  the  satisfaction  of  packet  delivery  without  missing  any  recipient  is 
considered,  what  is  the  speed  upper  bound  under  this  constraint?  Third,  if  the  optimal  transmission  radius  R  is 
used  but  the  relay  nodes  arc  not  perfectly  located,  what  is  the  gap  between  the  actually  achieved  speed  and  the 
desired  upper  bound  IF?  We  attempt  to  provide  the  answers  to  these  three  questions  in  this  work. 

Main  Results:  Our  objective  is  to  find  the  speed  of  propagation  in  an  arbitrary  network  which  nodes  arc  not  placed 
evenly  and  they  have  different  transmission  ranges. 

Before  we  investigate  the  speed  of  information  propagation,  it  is  necessary  to  understand  how  information 
propagates  in  multihop  wireless  networks.  An  illustration  is  shown  in  Figure  18,  in  which  a  packet  is  originated 
by  node  vq  at  time  zero.  Node  vq  chooses  a  transmission  radius  rVQ  and  sends  out  the  packet.  The  packet  is 
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Figure  18:  Information  propagation  in  multihop  networks. 

received  by  all  the  relay  nodes  in  Avo  by  the  end  of  no’s  transmission.  These  relay  nodes  then  forward  the  packet 
by  rebroadcasting  it.  Figure  18(a)  depicts  the  area  reached  by  the  packet  after  two  hops. 

Denote  V(f)  as  the  set  of  nodes  that  have  received  the  packet  by  time  t  and  V(  t)  C  V(f)  as  the  subset  that  have 
forwarded  the  packet.  The  total  area  that  the  packet  has  reached  by  time  t  is  expressed  as  A(t)  =  Uv_^^AVi. 
In  addition,  let  C  ~  denote  the  line  starting  from  no  toward  the  direction  p  <G  [0,  2ir)  and  Cv(t)  =  C  f  D  Ait).  In 
Figure  18(b),  Cv{t)  is  the  line  segment  oz.  The  Information  Propagation  Speed  in  the  direction  p  is  then  defined 
to  be 

«v(t)  =  (47) 

where  \Cip{t)  \  is  the  length  of  £^(i). 

As  the  first  contribution,  we  show  that  there  is  another  optimal  transmission  radius  other  than  R  if  the  packet 
transportation  distance  is  not  a  multiple  of  R.  We  note  that  in  broadcast  communications,  as  the  locations  of 
packet  recipients  may  not  be  known  in  advance,  R  is  the  best  transmission  strategy.  In  unicast  communications, 
however,  the  location  of  the  packet  recipient  may  be  known.  If  the  known  transportation  distance  is  not  a  multiple 
of  R,  another  transmission  radius  that  optimally  fits  the  specific  distance  should  be  used.  Interestingly,  we  find 
that  there  is  a  unified  optimal  transmission  radius  and  speed  upper  bound  in  large  wireless  networks. 

As  the  second  contribution,  we  determine  the  speed  upper  bound  under  the  constraint  of  guaranteeing  a  given 
level  of  packet  delivery  satisfaction.  We  examine  two  different  noise  models.  In  the  first  model,  the  noise  in  the 
network  is  determined  by  the  background  Gaussian  noise.  We  show  that  there  exists  a  threshold  node  density, 
above  which  there  is  a  constant  speed  upper  bound.  In  the  second  model,  interference  is  the  determinant  source 
of  noise.  We  show  that  there  also  exists  a  threshold  node  density,  above  which,  however,  the  speed  upper  bound 
decreases  to  zero. 

Based  on  our  theoretical  analysis,  we  find  that,  given  the  parameter  7,  there  exists  an  optimal  transmission 
radius  i?7( A)  that  may  achieve  the  maximum  information  propagation  speed  FF7( A)  in  a  network  with  node 
density  A.  However,  as  we  have  discussed  earlier,  actually  achieving  this  maximum  speed  requires  an  additional 
condition  that  all  the  relay  nodes  are  aligned  and  separated  from  each  other  by  the  distance  R1( A).  Since  the 
nodes  are  randomly  distributed,  it  is  impossible  to  find  these  perfectly  located  relay  nodes  when  A  <  00.  There 
is  always  a  gap  between  the  actually  achievable  speed  w!f(t)  and  the  bound  114, (A). 
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As  the  third  contribution,  we  quantify  the  gap  between  the  actually  achievable  speed  and  the  desired  upper 
bound.  We  prove  that  a  packet  propagates  omnidirectionally  in  large  wireless  networks  and  the  speed  gap  shrinks 
as  the  node  density  increases.  Furthermore,  we  show  that  in  both  noise  models,  there  exists  a  threshold  node 
density,  below  which  the  gap  is  bounded  by  constants  and  above  which  the  gap  converges  to  zero  exponentially. 
Since  this  result  is  more  practical  and  useful  to  DTRA  missions,  we  present  the  main  results  here. 


By  definition,  the  actual  information  propagation  speed  is  measured  by  wv{t)  =  Due  to  the  ran¬ 

domness  of  node  locations,  this  speed  may  be  faster  or  slower  when  the  packet  travels  through  different  subareas 
in  the  network.  To  evaluate  w^it)  without  introducing  the  subarea  bias,  we  define  the  long-term  speed  in  the 
direction  cp  to  be 


=  lim  Wtp(t)  =  lim  - 

t— XX)  t—>oo  t 

Since  every  node  uses  the  same  optimal  transmission  radius  i?7(  A),  the  1-hop  transmission  time  r 
is  the  same  for  every  node.  Thus,  Eq  (48)  is  rewritten  as 


(48) 

_ L _ 

Blog2(l+£i?-“(A)) 


w,n  =  lim  — —  =  lim 
^  m— xx)  777,7"  m— xx 


Era  — 

i— 1  Pi  _  P 

T 


mr 


(49) 


where  Z{  =  \ozt\,  pt  =  Z%  -  Z%_ ,  and  p  =  lim^oo 


Em 

i= 1  Pi 


E[Pi]. 


4.1.2  How  Fast  Can  Failure  Propagation? 


First,  we  show  that  the  actual  information  propagation  speed  is  omnidirectional  in  large  networks.  In  the  long 
term,  a  packet  is  disseminated  to  the  same  distance  away  from  the  source  in  any  direction  and  the  frontier  of  the 
covered  area  is  in  the  circular  shape  with  the  source  node  as  the  center,  as  specified  in  the  following  theorem. 

Theorem  6.  In  a  network  with  homogeneous  node  distributions,  V  ipi,  (p2  F  [0, 27r),  wVl  =  w^2  =  w. 

Remark  9.  Theorem  6  states  the  fact  that  a  packet  reaches  the  same  distance  away  in  any  direction  after  suffi¬ 
ciently  long  propagation  time,  though  it  can  be  faster  or  slower  temporarily  in  one  direction  than  another. 


Figure  19  depicts  an  example  of  the  speed  comparison  in  different  directions.  As  the  packet  propagates 
farther  away,  the  speeds  in  all  directions  converge  to  the  same  value. 

It  can  be  proved  that  w  is  determined  by  the  node  density  A,  we  write  w  =  w(\)  =  A—.  We  define  the  gap 
between  the  actual  speed  w( A)  and  its  upper  bound  ITf  ( A )  as 


=  W7(A)  —  w(X)  = 

K  }  W7(A)  Ry(X) 


(50) 


The  next  theorem  provides  a  quantified  measurement  of  e(A). 

Theorem  7.  In  a  network  where  the  nodes  are  randomly  distributed  in  a  Poisson  point  process  with  density 
A  and  the  optimal  transmission  radius  i?7( A)  is  used,  defining  a  =  A7ri?7(A),  gi(a)  =  e“^2_1Mx  and 

92(a)  =  f(;  e~lax2dx, 

gi(a)  <  e(\)  <  g2(a) .  (51) 
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Figure  19:  A  comparison  of  the  packet  propagation  speeds  in  six  randomly  chosen  directions,  in  which  the 
normalized  speed  is  defined  as  the  ratio  of  the  minimum  speed  to  the  maximum  speed  in  the  six  directions, 
A  =  30. 


Node  Density  Node  Density  Node  Density 


(a)  a  =  2  (b)  a  =  4  (c)  a  =  6 

Figure  20:  The  speed  gap  in  the  ambient-noise-dominant  model,  j?  =  103. 


Remark  10.  Theorem  7  provides  the  bounds  on  the  convergence  rate  of  the  speed  gap  as  the  node  density 
increases. 


Remark  11.  We  find  that  in  the  ambient-noise-dominant  model  the  speed  gap  converges  to  zero  exponentially 
with  exponent  A 1  ~e,  where  e  is  an  arbitrarily  small  positive  real  number.  The  speed  gap  e(A)  and  its  bounds  are 
shown  in  Figure  20  as  an  example. 

Remark  12.  We  find  that  in  the  interference-dominant  noise  model  the  speed  gap  converges  to  zero  exponentially 
with  exponent  A^^F1  ~e\  where  e  is  an  arbitrarily  small  positive  real  number. 


Therefore,  our  results  show  that  in  both  noise  models  there  is  a  threshold  node  density,  below  which  e(A) 
is  bounded  by  constants  (the  constants  are  determined  by  the  choice  of  parameter  7)  and  above  which  e(A) 
converges  to  zero  exponentially,  in  the  rates  of  cAl  "  and  cA<  '  respectively. 

The  study  on  speed  of  information  propagation  is  a  new  problem  toward  the  first  thrust  of  the  original  goals, 
that  is,  the  fundamental  limits  of  network  vulnerabilities  in  the  present  of  multiple  failures. 
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4.2  Failure  Propagation  via  Topological  Analysis 

To  assess  the  health  of  a  network,  we  may  have  access  to  a  variety  of  measurements  at  the  individual  nodes.  The 
characterization  of  these  measurements  yields  a  variety  of  information,  which  include  that  about  the  functionality 
of  the  network.  To  cope  with  an  onset  or  the  aftermath  of  a  significant  attack  in  the  likes  of  one  by  weapons  of 
mass  disruption/destruction,  our  focus  primarily  lies  in  detecting  early  failures  and  in  localizing  them  to  infer  the 
degree  to  which  the  topology  of  a  network  is  preserved.  The  power  of  a  detection  strategy  lies  in  its  versatility 
and  its  adaptability 

4.2.1  Objectives  and  Approaches 

We  take  a  more  abstract  approach  of  characterizing  the  network  processes  as  ‘Data’  along  with  some  ‘Charac¬ 
teristics’  associated  with  it.  We  present  the  task  of  identifying  the  process  taking  place  in  the  network  as  a  Data 
Classification  problem.  There  is  extensive  literature  present  in  the  field  of  Statistical  Data  Learning/Classification 
which  we  can  put  to  effective  use  when  modeling  the  network  processes  in  the  above  manner.  Furthermore,  we 
explore  a  recently  emerging  field  of  Applied  Topology  to  study  our  network  data.  Data  in  many  practical  applica¬ 
tions  does  not  admit  any  natural  geometric  properties  and  searching  for  metrics  in  this  space  might  be  redundant. 
(Although,  searching  for  right  ‘features’  which  characterize  our  data  is  important).  It  is  for  this  reason  that 
analyzing  the  topological  structure  of  the  data  becomes  more  apt  for  our  situation. 

Our  approach  is  to  consider  a  network  of  nodes  represented  by  A',  and  each  node  has  a  measurement  vector 
Mi .  We  assume  that  the  nodes  Xt  are  random  samples  taken  from  a  manifold  and  the  measurements  are  samples 
of  a  vector  field  defined  on  this  manifold.  There  is  an  operator  which  is  a  mapping  from  this  vector  field  onto  a 
feature  space.  The  operator  is  chosen  such  that  the  topological  invariants  of  the  feature  space  correspond  in 
some  way  to  the  underlying  processes  on  the  network.  The  problem  statement  then  is  three-fold: 

•  Designing  the  operator  based  on  prior  knowledge  of  the  process  such  that  the  topology  of  the  feature  space 
reflects  the  characteristics  of  the  process. 

•  Identifying  or  classifying  the  underlying  processes  in  the  network  by  analyzing  the  topology,  or  if  necessary 
and  appropriate,  the  geometry  of  the  feature  space. 

•  Localizing  the  required  processes  in  the  network. 

4.2.2  Experimental  Results  and  Observations 

We  experiment  with  the  network  database  from  the  KDD  2000  contest.  The  data  base  is  a  series  of  records 
where  each  record  describes  a  connection.  Many  attacks/intrusions  were  emulated  in  a  real-life  networks  and 
information  about  each  connection  were  stored  as  a  record.  In  our  model,  we  treat  the  set  of  records  at  each 
node  X,  as  the  measurement  vector  Mi  .  Also,  as  of  now,  we  collect  all  these  measurement  vectors  and  perform 
’’centralized”  processing.  Our  future  goal  is  to  decentralize  the  processing  so  as  to  minimize  the  overhead  on  the 
network.  In  our  preliminary  attempts  towards  designing  such  an  operator,  we  employed  the  linear  discriminant 
analysis.  In  this  method,  we  design  a  linear  operator  from  the  measurement  space  to  the  feature  space  such  that 
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points  corresponding  to  different  processes  (“normal”  connections,  DOS  attack,  etc...)  form  different  clusters  and 
points  within  each  cluster  are  packed  as  tightly  as  possible.  We  describe  below  the  linear  discriminant  analysis 
briefly: 

We  are  given  a  training  data  set  X^k  ,  N  is  the  number  of  records  and  k  is  the  dimension  of  each  record. 
The  records  correspond  to  C  different  processes  which  we  call  classes  which  are  available  to  us.  Therefore,  the 
data  set  can  be  partitioned  into  C  disjoint  sets  as  X  =  X\ ,  W>, ....  Xc  where  Xt  represents  the  set  of  records 
corresponding  to  the  class  i  .  We  represent  a  record  in  the  set  Xt  as  xl  .The  number  of  records  in  class  i  is 
represented  by  Nt  .  The  goal  is  to  design  a  linear  operator  A  so  as  to  optimize  the  fisher  criterion: 


*  ATSb-A 

A  =  argrnax  —7^ - - 

ATSW  ■  A 
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where  Sb  is  called  the  between  class  scatter  matrix  and  Sw  is  called  the  within  class  scatter  matrix.  The  above 
optimization  problem  reduces  to  generalized  eigen-value  problem  and  can  be  solved  by  simultaneously  diago¬ 
nalizing  both  the  scatter  matrices.  The  figures  below  show  that  effectiveness  of  such  analysis. 


Figure  21:  Result  of  linear  discriminant  analysis. 

To  the  left  above  in  Figure  21  records  from  three  different  processes  are  plotted  in  three  dimensions  (after 
reducing  the  dimension  from  41  to  3).  On  the  right  in  Figure21  shows  the  same  records  after  being  mapped  onto 
the  feature  space  using  the  linear  operator  designed  as  stated  above.  As  the  figure  shows,  the  records  of  two  kinds 
of  processes  which  were  indistinguishable  originally  shows  considerable  amount  of  topological  variation  in  the 
feature  space.  Even  though  they  perform  very  well  in  certain  cases,  the  linear  operators  as  design  above  form  a 
small  subset  of  the  vast  class  of  operators  one  can  design  and  fail  to  capture  the  non-linear  discriminant  features 
in  the  measurement  data.  For  this,  we  should  consider  the  classes  of  non-linear  maps  and  optimize  among  those 
classes. 

To  give  an  idea  of  which  non-linear  operators  are  best  suited  for  certain  situations,  consider  the  following 
Figure  22:  The  figure  to  the  left  shows  a  simulated  data  set  which  red  ’’blobs”  show  on  the  top  and  to  the  bottom 
correspond  to  the  same  process.  The  resulting  feature  space  of  our  mapping  should  be  able  to  group  the  red  blobs 
together  and  separate  the  blue  blob.  Clearly,  linear  operators  cannot  accomplish  such  a  task.  The  figure  to  the 
right  shows  the  same  data  points  but  color  coded  with  the  feature  values  of  a  non-linear  operator.  The  feature 
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Figure  22:  Non-linear  mapping. 

space  is  one-dimensional  in  this  case.  The  mapping  of  this  non-linear  operator  tp:  ’IT  ’  ’ft  is  shown  in  Figure  23. 


Figure  23:  Optimization  result  with  non-linear  operator. 

It  is  quite  apparent  at  this  stage  that  non-linear  features  can  better  capture  the  discriminative  characteristic 
of  the  data.  But  it  is  often  difficult  to  choose  the  right  class  of  non-linear  operators.  Also,  is  very  important 
and  extremely  useful  to  be  able  to  design  such  an  operator  in  a  decentralized  manner  within  the  network  without 
collecting  the  data  at  a  central  place.  Both  the  above  problem  will  be  a  major  focus  of  our  future  research. 

4.2.3  Clustering  Analysis  based  on  Applied  Topology 

We  further  studied  Applied  Topology.  As  seen  in  the  above  application,  we  were  looking  for  different  clusters 
in  the  feature  space  to  distinguish  between  different  processes.  Clustering  can  be  viewed  as  a  discrete  version 
of  looking  for  connected  components  in  a  continuous  space.  Such  a  feature  is  called  a  topological  feature. 
The  phrase  “studying  topology  of  a  space”  roughly  implies  studying  the  arrangement  of  points  in  that  space. 
We  extend  the  above  idea  of  looking  for  connected  components,  which  is  a  zero  order  feature  to  higher  order 
topological  features  such  as  cycles,  voids  etc.  We  present  here  a  few  applications  where  such  higher  order 
features  can  give  us  considerable  information  about  the  underlying  process  on  which  this  topological  space  is 
described. 

We  start  with  a  practical  scenario/process  and  represent  it  using  a  topological  space.  Briefly,  a  topological 
space  describes  a  set  of  all  open  sets  so  that  we  can  define  a  notion  of  continuity.  We  then  analyze  these  spaces 
to  extract  topological  invariants  which  characterize  the  underlying  processes.  In  the  application  described  above, 
the  underlying  processes  are  the  different  kinds  of  network  connections  and  the  topological  space  is  the  feature 
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space  (the  range  of  our  designed  operator)  in  which  we  were  searching  for  connected  components.  In  order  to 
facilitate  the  digital  computation  of  these  invariants,  we  use  homology  which  assigns  to  each  topological  space 
a  sequence  of  algebraic  objects  such  as  groups  and  vector  spaces.  The  topological  invariants  can  be  extracted 
by  performing  computations  on  these  algebraic  objects.  The  following  diagram  depicts  pictorially  the  above 
process. 


Figure  24:  Big  picture  of  applied  topology. 

Homology  is  a  function  which  assigns  to  each  topological  space,  a  sequence  of  algebraic  objects.  We  call 
these  objects  homology  classes.  But  computing  homology  of  an  uncountable  set  (containing  infinite  points  such 
as  a  disc  in  two  dimensions)  is  not  possible  on  a  computer.  We  can  show  that  under  certain  mild  conditions  we 
can  use  random  sampling  from  this  space  to  extract  its  homology  accurately  with  certain  probability  guarantees. 
More  specifically,  a  simplicial  complex  called  the  Rips  complex  formed  using  these  random  samples  has  the 
same  homology  classes  as  the  original  topological  space.  We  now  define  precisely  a  simplicial  complex  and 
Homology  classes  and  describe  the  method  of  computing  these  homology  classes.  We  then  describe  a  process 
for  computing  these  homology  classes  distributively  in  a  network. 


Figure  25:  Assigning  algebraic  objects. 

We  also  investigate  other  methods  such  as  simplicial  complex  and  chain  complex.  A  simplicial  complex  is  a 
generalization  of  the  standard  notion  of  a  graph.  In  a  graph  we  have  a  set  of  nodes  and  a  set  of  pairs  of  nodes 
which  we  call  edges.  The  edges  can  either  be  directed  or  undirected  (oriented  or  unoriented).  A  Simplicial 
Complex  generalized  this  concept  to  form  simpleces  which  can  contain  any  number  of  nodes  instead  of  two  as 
in  edges.  A  k’-th  order  simplex  contains  k  +  1  nodes.  A  face  of  a  k-th  order  simplex  is  a  k  —  1  order  simplex 
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formed  by  removing  one  of  the  node.  An  oriented  A:-th  order  simplex  is  one  which  an  order  is  specified  for  the 
set  of  k  +  1  nodes.  A  more  precise  definition  of  a  simplicial  complex  is  as  follows: 

A  simplicial  complex  K  is  a  collection  of  simplices  of  where  k  is  the  order  and  i  is  the  index  such  that: 

•  Every  face  of  a  simplex  a  £  K  also  £  K. 

•  The  intersection  of  any  two  simplices  (defined  as  the  simplex  with  common  vertices)  is  a  simplex  in  K. 

For  Chain  Complex,  we  assign  a  sequence  of  vector  spaces  Ck  to  a  simplicial  complex  K  .  The  points  in 
these  spaces  arc  called  chains.  The  vector  space  Ck  is  formed  by  taking  all  the  k- th  order  simplices  in  K  as  the 
basis  and  taking  all  linear  combinations  over  a  field.  We  will  usually  work  on  the  field  or  real  numbers  R. 

Given  a  chain  complex,  we  define  homology  classes  (vector  spaces  in  this  case)  as  the  following  quotient 
space: 

Hk{K)  =  ker(dk)/img(dk+1) 

The  dimensions  of  the  homology  classes  give  us  useful  topological  information.  The  dimension  of  Hq(K)  (also 
called  zeroth  Betty  number  or  bo  )  gives  us  the  number  of  components.  h\  gives  us  the  number  of  cycles,  b-2  gives 
us  the  number  of  voids  and  so  on. 

For  computing  these  classes  distributively,  we  make  use  of  Faplacian  operators  on  the  simplicial  complex  K 
.  The  k- th  order  Faplacian  operator  Lk  :  Ck  Ck  is  defined  as 

Lk  =  <9fc+i<9fc+1  +  d^dk 

A  special  property  of  the  Faplacian  operator  is  that  its  action  over  a  simplex  can  be  expressed  only  in  terms  of 
its  adjacent  simplices.  We  use  this  local  property  to  compute  the  laplacian  of  K  over  a  network.  Now,  we  have 
an  important  relation  that  the  kernel  of  the  A  -tli  Faplacian  is  Isomorphic  to  Hk(K)  .  Therefore,  the  problem  now 
reduces  to  computing  distributively  the  null  space  of  the  /c-th  laplacian. 

4.2.4  Relevance  to  Original  Goals 

In  order  to  identify  failures  in  a  large-scale  network,  a  variety  of  measurements  may  be  taken.  This  work,  along 
with  some  "Characteristics’  associated  with  it,  present  a  solution  for  early  detection  of  failures  and  localization 
of  failures.  We  present  the  task  of  identifying  the  process  taking  place  in  the  network  as  a  Data  Classification 
problem  in  the  field  of  Statistical  Data  Fearning/Classification  which  we  can  put  to  effective  use  when  modeling 
the  network  processes.  Also,  we  explored  a  recently  emerging  field  of  Applied  Topology  to  study  our  network 
data  that  do  not  admit  any  natural  geometric  properties. 

4.3  Multi-Failure  Correlation  and  Spreading 

As  one  of  the  two  thrusts  we  intend  to  explore  in  this  project,  we  want  to  understand  and  characterize  the  impact  of 
multiple  failures  caused  by  WMD  attacks  on  the  network  infrastructure.  In  our  work  of  last  year,  we  investigated 
the  multiple  failure  problem  by  modeling  the  network  under  attack  as  a  devolution  process  and  determined 
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the  critical  time  for  network  partition.  As  failures  occur  over  time,  the  network  transits  from  its  original  fully 
connected  state  into  a  partitioned  state  in  which  node  communication  distances  become  limited.  Our  results  give 
the  first  evaluation  of  this  critical  turning-point  in  the  network  devolution  process.  However,  we  only  considered 
the  scenario  of  random  and  independent  node  failures  in  last  year.  The  underlying  assumption  was  that  multiple 
node  failures  arc  isolated  events  from  one  another.  We  note  that  in  some  cases  node  failures  could  be  correlated 
in  the  sense  that  a  failure  happens  as  a  result  of  earlier  failures.  We  extend  our  previous  work  by  incorporating 
failure  correlations  to  advance  our  understanding  of  the  impact  of  multiple  failures  on  the  network  infrastructure. 

4.3.1  What  is  Multi-Failure  Correlation? 

To  design  WMD  resistant  networks,  we  first  need  to  understand  the  severity  of  impact  that  WMD  attacks  can  cre¬ 
ate  in  the  network  infrastructure.  Many  existing  works  in  the  literature  have  proposed  a  wealth  of  solutions  that 
improve  the  network  resilience  to  failures  via  different  techniques  and  strategies,  including  failure  prevention, 
failure  detection,  failure  mitigation,  and  failure  reparation,  which  aim  to  restore  the  normal  network  functioning 
when  failures  happen.  The  majority  of  these  studies  deal  with  specific  types  of  failures  and  their  countermeasures, 
for  example,  planning  traffic  paths  away  from  the  failure -prone  regions,  providing  continuous  surveillance  cov¬ 
erage  when  a  subset  of  sensors  fail,  finding  alternative  routes  when  nodes  or  links  become  unavailable,  and  using 
small-sized  packets  and  error  control  codes  to  communicate  in  networks  with  low-quality  links.  These  studies 
have  greatly  contributed  to  the  failure  resilience  of  large  networks.  From  a  different  perspective,  we  intend  to 
characterize  the  failure  impact  on  the  network  infrastructure.  Our  work  provides  a  fundamental  understanding  on 
the  quantification  of  failure  impact,  which  is  a  critical  supplement  to  the  existing  research  works  and  an  important 
step  towards  designing  effective  counter-failure  solutions  under  the  threats  of  WMD. 

Our  prior  work  analyzed  the  critical  phase  transition  time  of  a  large  network  in  which  random  failures  grad¬ 
ually  break  down  an  initially  connected  network  into  small  pieces  of  disjoint  components.  The  results  present  an 
observation  on  the  impact  of  failures  from  the  network  connectivity  perspective.  However,  we  did  not  character¬ 
ize  all  the  failure  possibilities  under  WMD  threats.  In  some  cases,  causal  relations  exist  among  failures,  i.e.,  some 
failures  happen  as  a  result  of  other  earlier  failures  caused  by  WMD.  One  example  of  correlated  failures  is  traffic 
overloading  and  energy  depletion.  When  a  node  fails,  its  traffic  is  redistributed  to  the  neighboring  nodes  and 
these  neighboring  nodes  undertake  a  heavier  packet  relaying  load  than  before.  Some  neighbors  may  deplete  their 
energy  fast  and  fail  in  a  short  time.  Another  possible  case  of  failure  correlation  is  node  malfunction  and  protocol 
non-compliance  after  WMD  attacks.  When  a  subset  of  nodes  are  damaged  by  attacks,  they  may  not  comply  with 
their  designed  protocols  and  cause  other  nodes  to  work  in  unexpected  states.  The  node  malfunction  may  spread 
out  from  the  initially  damaged  nodes  to  other  nodes  as  a  result  of  the  protocol  non-compliant  working  states. 

In  view  of  the  potentially  devastating  impact  caused  by  correlated  failures,  we  characterize  the  spread  of 
correlated  failures  in  the  network  infrastructure.  In  particular,  we  attempt  to  determine  the  conditions  under  which 
a  single  initial  failure  will  and  will  not  spread  to  the  entire  network.  In  an  effort  to  gain  a  generally  applicable 
understanding  on  the  spread  of  correlated  failures,  we  model  the  failure  correlations  as  general  functions  and 
determine  their  characteristic  regimes  in  terms  of  the  ability  of  an  initial  failure  to  permeate  the  network.  The 
correlation  functions  model  the  geometric  constraints  in  failure  propagation,  i.e.,  the  distance  and  probability  for 
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a  failure  to  spread  in  one  hop.  Based  on  the  correlation  functions,  we  then  use  the  percolation  theory  to  tackle  the 
failure  spreading  problem.  Percolation  theory  provides  a  mighty  tool  for  analyzing  a  wide  range  of  contact-and- 
relay  problems  observed  in  reality.  We  determine  analytically  the  pervasiveness  of  failure  spreading  conditioned 
on  the  failure  correlation  functions  by  using  the  concept  of  percolation.  Intuitively,  stronger  correlations  drive 
an  initial  failure  to  spread  into  a  larger  area  than  weaker  correlations.  Our  results  provide  a  quantification  of  the 
relation  between  failure  correlations  and  failure  pervasiveness  in  large  networks.  The  results  arc  useful  for  us  to 
evaluate  and  enhance  the  resilience  of  our  network  infrastructure  against  WMD  attacks. 

4.3.2  Modeling  of  Node  Failure  Correlation 

Main  Results:  Our  contributions  are  the  formal  and  generalized  modeling  of  the  correlated  failures  and  determi¬ 
nation  of  the  conditions  in  which  an  initial  failure  will  and  will  not  spread  out  to  the  entire  network. 

We  model  the  communication  network  under  WMD  threats  as  a  large  network  consisting  of  n  nodes  in 
a  region  B  =  [— |r-  if]'2  (L  — >  oo).  The  number  of  nodes  n  is  assumed  to  be  a  Poisson  distributed  random 
variable  with  constant  density  A  everywhere  in  the  network.  Let  X,  (1  <  i  <  n)  denote  the  random  location 
of  node  ty  that  is  uniformly  distributed  in  the  network,  independent  of  n  and  any  X,  (i  f  j).  We  know  that 
H\  =  {Xi,  •  •  •  ,  Xn}  is  a  homogeneous  Poisson  point  process. 

We  model  the  node  failure  correlation  by  defining  two  probabilistic  correlation  functions.  When  a  node  fails, 
it  may  trigger  other  failures  in  its  neighbors.  Let  \\X,t  —  Xj\\  denote  the  Euclidean  distance  between  ty  and 
Vj.  We  define  the  failure  impact  radius  r  to  be  the  farthest  distance  between  the  location  of  a  triggering  failure 
and  the  location  of  an  immediate  follow-up  failure,  i.e.,  failure  may  propagate  from  ty  to  Vj  in  one  hop  only 
if  || Xj  —  Xj  ||  <  ty,  where  ry  is  the  r  of  ty.  Considering  the  difference  of  the  nodes  in  their  respective  failure 
impacts,  the  impact  radius  r  is  modeled  as  a  random  variable  with  probability  density  function  f(x)  (0  <  x  <  1). 
For  different  nodes  ty,  and  ty2 ,  ry,  and  ry2  are  independent.  Besides,  we  define  the  failure  connection  function 
(j(x)  to  model  the  likelihood  of  failure  propagation  from  ty  to  Vj.  If  Vj  is  located  within  the  impact  radius  of  ty, 
failure  spreads  to  Vj  with  a  probability  g(  \\X,  —  Xj  1 1 ) .  If  Vj  is  beyond  the  impact  radius  of  ty,  failure  cannot 
spread  from  ty  to  Vj.  For  any  two  pairs  of  nodes,  failure  propagation  is  independent  from  each  other. 

To  differentiate  between  the  possibilities  that  a  node  may  be  at  risk  of  failure  only  once  and  that  a  node  may 
be  impacted  by  other  nodes’  failures  multiple  times,  we  categorize  failure  correlations  into  two  classes:  one-time 
correlation  and  persistent  correlation.  We  say  that  the  failures  are  one-time  if  each  node  is  subject  to  the  impact 
of  other  nodes  only  once.  For  example,  if  nodes  ty  and  v3  have  failed  sequentially  and  ty.  is  within  their  impact 
radius  ty  and  ty .  then  ty,  only  has  the  failure  risk  after  ty.  If  ty.  does  not  fail  after  ty,  ty.  also  does  not  fail  when  vj 
fails.  We  say  that  the  failures  are  persistent  if  a  node  may  fail  each  time  one  of  its  neighbors  fails.  In  the  example 
we  used  above,  ty,  may  fail  after  both  ty  and  Vj  with  persistent  failure  correlation. 

We  use  percolation  theory  to  investigate  the  long-term  trend  of  failure  spreading  when  failures  are  correlated. 
When  failure  propagates  from  ty  to  v3  in  one  hop,  we  say  that  v%  and  v3  arc  connected  (or  the  connection  is 
open).  Otherwise,  ry  and  Vj  arc  not  connected  (or  the  connection  is  closed).  Note  that  the  connections  considered 
here  represent  the  failure  correlations  among  nodes,  which  arc  completely  different  from  the  communication 
links.  In  percolation  terminology,  each  node  in  the  network  is  a  site  and  failure  connections  define  the  bonds 
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Figure  26:  Failure  spreading  in  large  wireless  networks. 


between  neighboring  sites.  As  node  locations  arc  randomly  distributed,  this  percolation  process  is  also  known 
as  continuum  percolation.  In  its  basic  form,  continuum  percolation  assumes  an  open  bond  between  any  two 
neighboring  sites.  We  have  introduced  here  the  probabilistic  functions  f(x)  and  g(x)  for  each  bond,  so  our 
network  model  is  also  called  the  random  connection  model  in  which  a  bond  is  open  only  probabilistically. 

Given  node  locations  and  their  bonds,  our  problem  is  to  determine  whether  an  initial  failure  will  spread  to 
the  entire  network.  An  example  of  failure  spreading  is  illustrated  in  Figure  26.  In  this  example,  the  initial  failure 
occurs  at  node  no-  As  a  result  of  this  failure,  nodes  v\-v%  fail  subsequently  and  spread  the  failure  further  away 
to  nodes  V4-V13.  In  each  step  of  spreading,  a  node  that  has  just  failed  in  the  previous  step  passes  failure  to  a 
random  subset  of  nodes  in  its  neighborhood,  as  modeled  by  the  impact  radius  distribution  function  f(x)  and  the 
connection  function  g(x).  As  time  goes,  there  are  two  possible  results  regarding  failure  spreading:  either  the 
spreading  continues  for  ever  or  it  stops  automatically. 

We  represent  the  network  infrastructure  under  WMD  threats  as  a  random  geometric  graph  and  denote 
it  as  G(TL\,  /(•),  <?(•))•  Regarding  node  no  that  initiates  the  failures,  we  define  the  percolation  probability 
Poo(A,/(-),fl(0)  as 

Poo(A, /(•),£(•))  -  Pl'[|C'(^o)|  =  00],  (53) 

where  C(v 0)  denotes  the  cluster  of  nodes  that  fail  as  a  result  of  the  initial  failure  at  no  and  |C(no)|  denotes  the 
size  of  cluster  C(v 0).  When  |C(no)|  =  00,  we  call  C{v 0)  a  giant  component.  If  Poo(A,  /(•),  <?(•))  >  0,  failure 
percolates  in  the  network  with  a  positive  probability.  If  Poo(A,  /(•),  <?(•))  =  0,  failure  does  not  percolate  in  the 
network  almost  surely. 

Our  goal  is  to  determine  the  regimes  of  functions  f(x)  and  g(x)  such  that  Poo(A,  /(•),  g(-))  >  0  and 
Poo  (A ,  f(-),g{-))  =  0  with  given  A,  respectively.  Obviously,  G(H\,  /(•),</(•))  is  a  subgraph  of  G(H\ ,  1, 1)  and 
Poo  (A ,  =  0  whenever  poo  (A,  1, 1)  =  0.  To  avoid  triviality,  we  only  consider  the  case  p00(\1 1, 1)  >  0 

and  determine  the  respective  constraints  on  f(x)  and  g(x)  for  Poo(A,  /(•),#( ■))  >  0  and  Poo(A,  /(•),  g(-))  =  0. 
It  is  well  known  that  there  exists  a  critical  density  Ac  in  G(Tt\,  1,1)  defined  as 

Ac  =  inf{A  >  0  :  Poo(A,  1,1)  >  0}  (54) 

that  specifies  the  minimum  A  for  Poo(A,  1, 1)  >  0.  So  far,  the  best  known  rigorous  bounds  on  Ac  are  0.7698  < 
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Ac  <  3.372.  Hence,  we  assume  A  >  3.372.  In  addition,  if  r  is  a  constant,  we  also  assume  A r2  >  3.372,  which 
guarantees  Poo(X,  r,  1)  >  0  by  the  scaling  property. 

4.3.3  Impact  of  One-Time  Failure  Correlation 

Based  on  our  generalized  modeling  of  the  failure  correlations,  we  have  found  the  following  important  results 
regarding  the  failure  spreading  in  the  network  infrastructure  after  WMD  attacks. 

We  aim  to  find  the  conditions  under  which  an  initial  failure  will  and  will  not  spread  out  to  the  entire  network 
when  the  failures  are  one-time  correlated.  Specifically,  we  provide  the  quantified  relations  among  the  node 
density  A,  the  failure  impact  radius  r  and  the  failure  connection  probability  p  that  determine  the  failure  spreading 
trend.  In  more  general  cases  of  the  random  failure  impact  radius,  we  have  the  following  result. 

Theorem  8.  With  one-time  failure  correlation,  for  random  failure  impact  radius  r  with  probability  density  func¬ 
tion  f(x)  (0  <  x  <  1)  and  constant  g{x )  =  p, 

i)  p£Te) (A, /(•),£>)  >  0ifphi(f(x))  > 

ii)  p£ne)( A,  f(-),p)  =  0  ifp{  1  -  h2(f(x)))  < 

where  hi(f{x))  =  max{0<a<1}{a2  f(x)dx},  h2(f(x))  =  max{0<a<1}{(l  -  a2)  /“  f(x)dx}. 

Intuitively,  r  and  p  represent  the  degree  of  failure  contagiousness.  When  r  and  p  increase,  failure  tends  to 
percolate.  When  they  decrease,  percolation  becomes  unlikely  to  happen.  Our  results  in  Theorem  8  present  a 
quantified  measure  on  r  and  p.  Moreover,  in  the  general  case  of  a  random  r,  it  indicates  that  the  chance  of  failure 
percolation  increases  if  the  probability  distribution  f(x)  shifts  toward  large  r  (such  that  h  \  (f(x) )  increases)  and 
decreases  if  f(x)  shifts  toward  small  r  (such  that  h2(f(x))  increases). 

When  the  failures  have  persistent  correlation,  we  have  derived  the  following  results  to  characterize  the  con¬ 
ditions  under  which  failure  spreading  will  and  will  not  happen. 

Theorem  9.  With  persistent  failure  correlation,  for  constant  impact  radius  r  and  connection  probability  p, 
Poo  (A,  r,p)  >  0  ifpr 2  >  k8889. 

Theorem  10.  With  persistent  failure  correlation ,  for  constant  impact  radius  r  and  connection  probability  p, 

_  1/1  0.1642  x 

Poo  (A,  r,p)  =  0  ifpr2  <  — 2  5981^ — . 

Theorems  9  and  10  provide  us  with  the  sufficient  conditions  to  judge  whether  an  initial  failure  will  or  will  not 
spread  to  the  entire  network  when  the  failures  are  persistently  correlated.  Our  results  quantify  the  relation  among 
A,  r  and  p  to  predict  the  long-term  failure  spreading  trend.  We  also  further  generalize  the  results  in  Theorems  9 
and  10  by  considering  arbitrary  f(x)  and  g(x)  functions  and  to  obtain  more  applicable  results. 

As  a  visual  verification  of  our  derived  failure  percolation  conditions,  we  present  the  simulation  results. 
Specifically,  we  give  our  simulated  failure  spreading  results  for  Theorem  8.  All  the  other  simulation  results 
for  the  other  theorems  are  similar,  so  we  do  not  include  each  of  them  in  this  report. 
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(a)  p\  =  9.8  (b)  p\  =  2.3 

Figure  27 :  Simulation  results  for  failure  percolation  and  non-percolation. 


In  our  simulations,  we  assume  that  f(x)  is  a  uniform  distribution  in  the  range  0  <  x  <  1.  It  is  not  difficult  to 
find  h\(f(x))  =  0.1481  and  /^(/(x))  =  0.3849.  By  Theorem  8,  we  know  that  failure  percolates  if  0.1481pA  > 
Ac  and  does  not  percolate  if  0.6151pA  <  Ac.  Given  0.7698  <  Ac  <  3.372,  we  need  to  verify  the  percolation 
results  for  pA  >  03,1^g21  =  22.7684  andpA  <  ^||||  =  1.2515  respectively. 

However,  as  the  existing  simulation  result  has  demonstrated  that  1.43  <  Ac  <  1.44  with  high  confidence,  we 
simulate  pA  >  =  9.7232  and  pA  <  () ]  =  2.3248  instead.  If  failure  percolates  with  pA  >  9.7232,  it 

must  percolate  with  pA  >  22.7684.  Similarly,  if  failure  percolation  does  not  happen  with  pA  <  2.3248,  failure 
definitely  cannot  percolate  with  pA  <  1.2515. 

Therefore,  we  choose  pA  =  9.8  and  pA  =  2.3  respectively  with  the  above  consideration.  We  present  the 
results  of  failure  spread  in  a  20  x  20  area  in  Figure  27,  where  the  initial  failure  occurs  at  the  center  of  this  area. 
For  clearer  presentation,  we  have  only  shown  in  the  figures  the  connections  through  which  failure  is  propagated 
from  a  failed  node  to  a  normal  node  (which  becomes  failed  after  the  propagation)  but  ignored  the  connections 
between  any  two  nodes  that  have  already  failed  before  these  connections  are  used  for  failure  propagation.  We 
observe  that  failure  is  able  to  spread  to  a  majority  of  nodes  in  Figure  27(a)  while  only  to  a  limited  small  cluster 
of  nodes  in  Figure  27(b),  hence  supporting  our  claims  in  Theorem  8. 

4.3.4  Impact  of  Persistent  Failures 

We  explore  the  percolation  of  persistent  failures  in  this  and  next  sections.  Specifically,  we  focus  on  constant 
failure  impact  radius  r  and  constant  failure  connection  probability  p  in  this  section.  We  will  study  the  general¬ 
ized  functions  f(x)  and  g(x)  in  the  following  section.  Next,  we  discuss  separately  the  sufficient  conditions  for 
percolation  and  non-percolation  of  persistent  failures  with  constant  r  and  p.  We  use  poc(A,  r,p)  to  denote  the 
percolation  probability  of  persistent  failures  with  constant  r  and  p. 

Sufficient  Condition  of  Percolation:  Constant  r  and  p  Our  discussion  on  the  difference  between  persistent 
and  one-time  failures  shows  that  persistent  failures  are  easier  to  percolate.  Therefore,  percolation  of  persistent 
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failures  should  happen  under  the  same  sufficient  condition  for  one-time  failures,  i.e.,  pr2  >  ^.  As  the  current 
best  known  rigorous  bounds  on  Ac  are  0.7698  <  Ac  <  3.372,  we  infer  that  percolation  occurs  when  pr 2  >  . 

In  regime  pr2  <  we  have  no  idea  on  failure  percolation.  Further  judgment  depends  on  the  improved 

accuracy  of  the  bounds  on  Ac. 

However,  since  persistent  failures  arc  easier  to  percolate  than  one-time  failures,  we  expect  a  tighter  sufficient 
condition  that  allows  percolation  with  smaller  r  and  p  than  pr 2  >  2 .  We  next  determine  a  new  condition  by 

using  the  technique  of  continuum-to-discrete  percolation  mapping.  We  divide  the  network  area  into  many  small 
hexagonal  cells  [53].  As  failure  spreads,  it  travels  through  a  cluster  of  continuous  cells.  We  define  a  cell  as  open 
if  it  contains  at  least  one  failed  node  and  closed  otherwise.  Let  C(j:e11  denote  the  cluster  of  open  cells  and  C<jel1 
denote  its  size.  It  is  obvious  that  |  <I7ge11 1  =  oo  if  |Co|  =  oo,  and  vice  versa.  The  mapping  between  the  cluster  of 
failed  nodes  and  the  cluster  of  open  cells  thus  allows  us  to  find  the  sufficient  condition  for  ICq611)  =  oo  and  use  it 
for  |  Co  |  =  oo. 

We  observe  that  when  the  cells  have  hexagonal  shape,  the  discrete  lattice  is  triangular.  In  the  literature,  square 
cells  are  usually  used  as  they  generate  square  lattice  that  is  simple  and  easy  for  analysis.  We  use  hexagonal  cells 
in  this  paper  for  two  reasons.  First,  hexagonal  cells  yield  a  tighter  bound  on  r  and  p  for  failure  percolation  than 
square  cells.  Second,  they  render  the  study  of  failure  non-percolation  possible  under  this  mapping  framework. 
With  triangular  lattice,  there  exists  a  critical  probability  pf  =  2  sin(jg)  =  0.3473  [?]  such  that  percolation 
occurs,  i.e.,  |Cgell|  =  oo,  if  each  bond  is  open  with  a  probability  higher  than  p£  and  not  occurs,  i.e.,  |Cq611|  <  oo, 
otherwise.  The  discrete  percolation  in  triangular  lattice  allows  us  to  derive  a  tighter  sufficient  condition  for  failure 
percolation.  Our  result  is 

Theorem  11.  For  constant  r  and  p,  poo(A,  r,p)  >0  if  pr2  >  1's|89. 


Sufficient  Condition  of  Non-Percolation:  Constant  r  and  p  Our  result  on  the  sufficient  condition  of  percola¬ 
tion  have  quantified  r  and  p  that  are  large  enough  for  an  initial  failure  to  spread  to  the  entire  network.  Similarly, 
we  are  also  interested  in  determination  of  r  and  p  when  failure  cannot  percolate  in  the  network.  From  the 
continuum-to-discrete  percolation  mapping,  we  know  that  |Co|  <  ooif  |C§ell|  <  oo.  Thus,  our  task  is  to  find  the 
condition  for  C'(jel1  <  oo. 

We  still  divide  the  network  into  many  hexagonal  cells.  Compared  to  square  cells,  failure  non-percolation  is 
easier  to  study  with  hexagonal  cells.  When  we  choose  the  cell  size  sufficiently  large,  the  failed  nodes  in  one 
cell  can  only  connect  to  the  six  neighboring  cells  and  the  connections  are  symmetric  in  probability.  We  can 
focus  on  the  connections  between  any  two  neighboring  cells  to  understand  failure  spreading.  With  square  cells, 
however,  a  cell  has  eight  neighbors:  four  horizontal  or  vertical  ones  and  four  diagonal  ones.  Its  connections 
to  the  horizontal  or  vertical  neighbors  are  not  symmetric  to  those  to  the  diagonal  neighbors,  rendering  failure 
percolation  difficult  to  analyze.  By  mapping  into  hexagonal  cells,  we  have  the  following  theorem  to  characterize 
the  sufficient  condition  on  r  and  p  for  failure  non-percolation. 


Theorem  12.  For  constant  r  and  p,  Poo(\,  r,p ) 


0  ifpr2  < 


_ln(l_0^42) 

2.5981A 
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Figure  28:  Function  'fr(g'(x),  Xa,  A)  =  fAg'(\\X0-X£ ||)der. 

Percolation  of  Persistent  Failures:  General  Failure  Correlations  When  the  failure  impact  radius  r  is  ran¬ 
dom  and  the  failure  connection  function  g(x)  is  in  a  general  form,  modeling  of  failure  spread  becomes  even 
complicated.  We  present  two  theorems  in  this  section  that  characterize  the  general  failure  correlations  for  perco¬ 
lation  and  non-percolation.  To  facilitate  our  study,  we  first  combine  the  functions  f(x)  and  g(x)  into  a  unified 
failure  connection  function. 

Lemma  4.  For  random  r  with  probability  density  function  f(x)  (0  <  x  <  1)  and  general  g(x),  the  failure 
correlations  can  be  modeled  equivalently  by  a  constant  r'  =  1  and  a  unified  connection  function  g'(x)  = 
g(x )  fj  f(z)dz  if  0  <  x  <  1  and  g'(x)  =  0  if  x  >  1. 

We  skip  the  proof  of  Lemma  4  due  to  space  constraint.  Given  the  equivalence  in  modeling  failure  correlations, 
we  consider  r’  and  g'(x)  instead  of  f(x)  and  g(x)  in  the  rest  of  this  section.  For  clearer  presentation,  we  now 
define  a  few  concepts  that  will  be  used  in  our  following  theorems.  We  define  function 

*(gf(x),X0,A)±  [  g'(\\X0-Xe\\)d£,  (55) 

J  A 

which  is  the  integration  of  the  probability  g'(x)  over  region  A  with  respect  to  location  Xa,  as  illustrated 
in  Figure  28.  In  addition,  with  respect  to  all  the  possible  XQ  and  A  (|A|  =  a),  we  define  tPTT1in(q/(x),  <r)  = 
min{x0,|A|=<T}{^r(5/(a;))-^'o)  A)}  and  mSLX(g' (x) , a)  =  max{Xo  |A|=CT}{T'(p/(x),  X0,  A)},  which  denote  the 
minimum  and  maximum  of  ^f(g'(x),  X0,  A)  respectively  when  we  change  Xa  and  A  arbitrarily,  as  long  as  the 
area  of  A  is  kept  constantly  as  a.  With  the  help  of  these  definitions,  we  present  our  results  regarding  failure 
percolation  with  general  f(x)  and  g(x)  as  follows. 

First,  the  sufficient  condition  of  percolation  regarding  f(x)  and  g{x)  is 
Theorem  13.  For  general  f(x)  and  g(x),  Poo( A,  f(-),g{-))  >  0  //fm in(g'(x),  0.1999)  >  °'4^66. 

Second,  the  sufficient  condition  of  non-percolation  is 

_  i  0.1642  \ 

Theorem  14.  For  general  f(x)  and  g(x),  Poo(A,  f(-),g(-))  =  0  if  ma_Jjf  {x) ,  1.2535)  <  - — . 

To  verify  Theorems  13  and  14,  we  consider  a  special  case  of  the  function  g'{x).  We  let  </(x)  be  a  lineally 
decreasing  function  of  x,  i.e.,  the  probability  of  failure  propagation  decreases  as  the  hop  distance  increases.  As 
before,  we  generate  randomly  located  nodes  in  a  20  x  20  area  with  node  density  A  =  5.  The  initial  failure 
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happens  at  the  center  of  this  area.  We  first  specify  g'(x)  =  1  —  0.56x  (0  <  x  <  1),  which  satisfies  the 
condition  given  in  Theorem  13.  We  also  observed  that  the  failure  percolates  in  the  network.  We  then  require 
g'(x)  =  0.0052  —  0.0052x  (0  <  x  <  1)  which  satisfies  the  condition  given  in  Theorem  14. 

Our  analysis  for  failure  spread  provides  a  mathematical  framework  to  evaluate  the  resilience  of  the  network 
infrastructure  against  correlated  failures.  Given  a  network  with  node  density  A,  we  may  sample  the  nodes  to 
estimate  their  failure  impact  radius  distribution  f(x)  and  their  failure  propagation  probability  g(x),  and  determine 
the  characteristic  regime  (percolation  or  non-percolation)  that  A,  f(x)  and  g{x)  fall  into.  This  evaluation  allows 
us  to  predict  the  potential  impact  on  the  network  when  failure  occurs. 

Besides  network  resilience  evaluation,  our  analysis  also  indicates  a  few  strategies  to  prevent  correlated  fail¬ 
ures  from  wide  spreading  in  the  network.  Known  from  the  analysis,  failure  becomes  unlikely  to  percolate  if  we 
reduce  the  impact  radius  r  and  the  failure  connection  probability  p  when  the  node  density  A  is  given.  We  arc 
hence  able  to  control  the  failure  spread  via  bounding  r  and  p. 

4.3.5  Relevance  to  Our  Goals 

One  of  our  research  thrusts  is  to  understand  the  impact  of  WMD  attacks  on  the  network  infrastructure.  An  insight¬ 
ful  understanding  of  the  attack  impact  provides  the  foundation  for  our  efforts  to  design  attack-resistant  networks. 
Our  work  on  the  failure  spreading  characterization  advances  our  previous  work  on  the  network  devolution  un¬ 
der  random  node  failures  and  presents  new  knowledge  regarding  the  network  resilience  in  an  environment  with 
potential  WMD  threats.  Our  results  have  the  following  unique  features: 

•  We  have  presented  a  formal  and  generalized  analytical  model  to  characterize  the  correlations  among  a 
variety  of  failures.  This  model  captures  the  essential  features  of  failure  correlations. 

•  For  one-time  failures,  we  contain  the  failure  spread  by  limiting  either  r  or  p.  To  limit  r,  each  node  in 
the  network  is  configured  not  to  execute  any  suspicious  command  received  from  nodes  located  beyond  a 
certain  distance,  which  effectively  reduces  the  impact  radius  of  each  failed  node.  To  limit  p,  we  need  to 
sample  and  test  some  portion  of  nodes  in  the  network  to  ensure  that  these  tested  nodes  arc  not  vulnerable 
to  the  type  of  failure  that  we  arc  concerned  with,  such  that  the  failure  probability  of  an  arbitrary  node  in 
the  network  is  controllably  low  after  the  test. 

•  For  persistent  failures,  however,  we  may  not  be  able  to  reduce  r  or  p  separately.  In  the  example  of  traffic 
overloading,  if  we  require  the  routing  logic  to  limit  the  path  reparation  radius  to  a  small  value,  the  failure 
probability  of  each  node  within  this  radius  might  be  high  as  each  node  is  expected  to  receive  a  large  share 
of  the  re-routed  traffic.  In  other  words,  r  and  p  arc  coupled.  Therefore,  the  best  strategy  to  reduce  the 
degree  of  failure  correlations  is  to  incorporate  load  balancing  into  the  routing  protocol  design. 

•  In  addition  to  the  network  resilience  evaluation,  our  results  also  suggest  network  design  criteria  to  ensure 
the  network  architecture  to  meet  certain  resilience  requirement. 
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4.4  Inter-Cooperation  toward  Robust  Communications 

Since  the  radio  resource  in  the  recovery  network  is  limited,  we  should  make  full  utilization  of  the  resource  to 
maximize  the  information  delivery  capability  of  the  recovery  network.  Our  work  on  the  localized  scheduling 
achieves  the  maximum  utilization  goal  by  coordinating  node  transmissions  carefully  to  avoid  collisions.  When 
each  node  has  a  large  amount  of  information  to  exchange  with  others,  the  radio  resource  in  the  network  is  fully 
used  by  our  scheduling  scheme.  However,  if  every  node  does  not  have  sufficient  information  to  send,  the  radio 
resource  is  wasted  when  a  node  is  assigned  a  radio  channel  but  does  not  send  any  information  over  the  radio 
channel.  In  such  cases,  we  can  temporarily  allow  other  nodes  to  borrow  the  unused  channels  to  improve  the 
resource  utilization,  which  is  called  a  Cognitive  Radio  Network  (CRN).  We  hence  have  studied  the  information 
propagation  problem  in  cognitive  radio  networks  as  a  supplementary  strategy  to  maximally  exploit  the  recovery 
network  utilization. 

4.4.1  Objectives  and  Spectrum  Recycling 

Our  research  objective  is  to  understand  the  feasibility  and  capability  of  delivering  information  in  the  recovery 
network  by  using  the  opportunistically  available  radio  channels.  In  recent  year's,  there  has  been  intensive  research 
on  understanding  and  optimizing  the  performance  limits  in  cognitive  radio  networks.  However,  an  interesting 
question  is  still  open  that  what  arc  the  achievable  benefits  of  information  dissemination  in  such  networks,  which 
is  essential  to  the  full  exploration  of  the  potentials  and  applications  in  cognitive  radio  networks.  Understanding 
how  packets  disseminate  and  their  temporal  and  spatial  limits,  such  as  dissemination  area,  transmission  speed 
and  latency,  can  also  be  beneficial  to  the  deployment,  design  and  application  of  cognitive  radio  networks. 

Similar-  problems,  on  the  other  hand,  have  been  studied  for  wireless  multihop  networks  or  sensor  networks, 
which  explore  the  conditions  for  connectivity  or  percolation  in  order  to  ensure  the  information  can  be  dissem¬ 
inated  to  the  whole  network.  In  addition,  information  propagation  speed  or  delay  has  been  discussed  in  recent 
works,  which  categorized  the  delay  into  bandwidth-incurred  propagation  delay  and  topology-incurred  delay.  In 
particular,  when  the  network  topology  remains  unchanged  or  changes  very  slowly,  the  bandwidth-incurred  prop¬ 
agation  delay,  which  is  the  transmission  time  spent  by  a  packet  in  all  the  links  along  its  transportation  path,  is 
dominant,  while  topology-incurred  delay  is  negligible. 

However,  these  existing  results  on  multihop  wireless  networks  are  not  applicable  to  cognitive  radio  networks. 
For  instance,  the  network  topology  in  cognitive  radio  networks  is  dynamic  not  only  because  of  factors  such 
as  user  mobility  and  radio  link  quality,  but  it  is  more  or  less  due  to  the  opportunistic  channels  available  over 
time.  As  a  result,  the  network  is  more  likely  to  be  a  percolated  network.  That  is,  a  network  is  almost  surely 
connected,  instead  of  a  fully  connected  one  as  assumed  in  earlier  works.  Furthermore,  in  current  studies  on 
network  topology  and  performance,  the  critical  density  is  a  key  condition  in  identifying  whether  a  network  is 
percolated  or  not.  When  node  density  is  higher  than  the  critical  density,  the  network  is  considered  percolated; 
otherwise,  the  network  is  not  percolated.  The  challenge  is  that  prior  results  are  based  on  a  common  assumption  of 
homogeneous  networks  in  which  all  nodes  are  the  same.  Nonetheless,  a  cognitive  radio  network  is  intrinsically 
a  heterogeneous  network  because  primary  nodes  and  secondary  nodes  are  different  from  each  other  regarding 
their  transmission  ranges,  usages  of  communication  channels,  locations,  and  routing  functions.  Therefore,  how 
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to  determine  the  conditions  for  a  percolated  cognitive  radio  network  is  an  unknown  problem. 

Our  research  approach  is  to  determine  the  fundamental  limits  for  information  propagation  in  cognitive  radio 
networks  by  defining  new  models  and  new  metrics  that  capture  the  most  basic  and  important  characteristics 
regarding  information  propagation.  Specifically,  we  focus  on  the  following  two  questions:  (1)  for  a  large  multi¬ 
channel  cognitive  radio  network,  how  far  can  a  packet  originated  from  an  arbitrary  node  be  disseminated?  (2) 
When  a  packet  can  be  disseminated  to  a  sufficiently  large  area,  how  long  does  it  take  this  packet  to  reach  a 
chosen  destination?  To  tackle  these  problems,  we  define  two  new  metrics,  the  disseminating  radius  ||£(t)||  and 
the  propagation  speed  S(d)  to  study  the  spatial  and  temporal  limits,  respectively.  The  former  is  the  maximum 
Euclidean  distance  that  a  packet  disseminates  in  time  t  and  can  be  used  to  characterize  the  dissemination  area. 
The  latter  is  the  speed  that  a  packet  transmits  between  a  source  and  destination  at  distance  d  apart,  which  can  be 
used  to  interpret  the  end-to-end  delay.  Here,  we  focus  on  the  topology-incurred  delay  by  ignoring  bandwidth- 
incurred  propagation  delay.  Our  study  has  leaded  to  a  few  insightful  understandings  on  || C(t)  ||  and  S(d). 

4.4.2  Main  Results  and  Inter-Operation 

Main  Results:  We  consider  a  large  cognitive  radio  network  consisting  of  n  secondary  nodes  { v \ . . . . ,  vn},  which 
distribute  independently  and  uniformly  in  a  region  Q  =  [0,  for  some  constant  A  and  opportunistically 

access  a  set  of  channels  {chi,  •  •  • ,  chm}.  Each  chk  is  licensed  to  an  overlaid  primary  network  Poisson  distributed 
with  density  Ap/,..  Instead  of  homogeneous  transmission  range,  each  y  is  assumed  to  have  an  independently 
adaptive  transmission  range  y  with  P(y  <  7)  for  some  constant  7  >  0,  to  save  energy  and  limit  interference 
with  primary  nodes.  We  assume  that  7  follows  a  common  distribution  Fr  for  simplicity.  Let  7i\  =  {X\ , . . . ,  Xn } 
denote  the  random  locations  of  secondary  nodes.  Denote  \\X,t  —  Xj\\  as  the  Euclidean  distance  between  v,  and 
Vj  and  they  may  communicate  with  each  other  directly  only  when  ||2Q  —  Xj\\  <  min(y,  77).  We  only  consider 
7  =  1  and  results  derived  here  can  be  easily  extended  to  the  scenarios  with  any  7  >  0. 

In  cognitive  radio  networks,  each  secondary  node  7  needs  to  frequently  sense  the  communications  among 
primary  users  in  the  neighborhood  before  transmission  to  avoid  interference.  That  is,  each  7  alternates  inde¬ 
pendently  between  the  communicating  (active)  state  and  sensing  (inactive)  state  with  periods  determined  by  the 
stationary  i.i.d  on/off  process  W(t).  Denote  7]  as  the  stationary  probability  of  7  being  active.  Let  Rj  be  the 
interference  range  of  primary  nodes.  We  say  chk  is  available  to  the  link  77.7  if  there  is  no  primary  users  using 
chk  within  B(Xi,Rj)  U  B(Xj,  Rj),  where  B(x,  r )  denotes  a  circle  with  radius  r  centering  at  x.  Denote  Ps  as 
the  probability  that  there  exist  at  least  one  channel  available  to  777  .  We  represent  the  cognitive  radio  network  as 
G(H Fr,  VE(t)).  To  address  our  research  problems,  we  have  defined  a  few  concepts. 

4.4.3  Dissemination  Area 

We  consider  that  information  is  disseminated  through  broadcasting  in  cognitive  radio  networks.  Let  us  denote 
V(f)  as  the  cluster  of  nodes  that  have  received  the  packet  by  time  t,  given  that  the  packet  is  sent  at  time  t  =  0. 
The  dissemination  area  at  t,  Al(t)  £  M2,  that  is,  the  total  area  covered  by  V(t),  can  be  expressed  by  Ait)  = 
Uu-evp)  1)’  where  B(x,  ?’)  is  a  ball  with  radius  r  centering  at  point  x  £  M2  and  X*  is  the  location  of  7. 
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Figure  29:  Illustration  of  information  dissemination  in  a  percolated  CR  network. 
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Figure  30:  An  illustration  of  dissemination  area  after  long  time  t. 


An  illustration  of  information  dissemination  in  a  cognitive  radio  network  G(Ti\ ,  Fr,  Wit))  is  shown  in  Figure 
29  in  which  a  packet  is  originated  by  node  vo-  When  vo  starts  broadcasting  this  packet  at  time  0,  its  neighbors 
receive  the  packet  and  they  rebroadcast  the  packet.  Here  two  nodes  Vi  and  Vj  are  called  “neighbors”  at  t  =  t! 
if  the  link  ViVj  £  G(Ti\,  Fr,  Wit')).  Without  considering  propagation  delay,  at  t  =  0,  the  packet  has  spread  to 
cluster  V(0)  C  G(Ti.\,  Fr,W(0))  containing  no  (see  Figure  29(a)).  We  refer  r  as  topology-incurred  delay  for 
node  v  £  V(r)\V(r_)  to  receive  this  packet  from  vq. 

Assume  that  vo  is  located  at  the  origin  o  £  M2.  Illustrations  of  dissemination  area  for  sufficiently  large  t 
are  shown  in  Figure  30.  Since  the  network  is  not  fully  connected,  any  sufficiently  large  area  B  is  only  partially 
covered  by  A{t).  The  uncovered  area,  which  is  also  called  vacancy  in  this  paper,  is  shown  in  Figure  30  as  shaded 
areas.  Letting  Aoo  =  Ait),  we  illustrate  A0 c  <  oo  and  A-^  =  oo  in  Figure  30(a)  and  Figure  30(b) 

respectively.  Denote  Lp  as  the  line  starting  from  the  origin  o  in  the  direction  (p  £  [0, 2-7r)  and  C^(t)  =  oz,  where 
o  =  arg  niax1ie£:.n_4(t)  |  v |  is  the  farthest  intersection  point  between  FA  and  Ait).  For  example,  in  Figure  30(b), 

CP  \  it)  is  the  line  segment  oz\.  The  length  of  C:p(t),  |FA(t)||,  is  defined  as  the  transmitting  distance  at  t. 

Definition?  (Dissemination  radius  ||£(t)||).  The  dissemination  radius  at  time  t  is  defined  as  ||£(t)||  =  nrax^gp^Tr)  ||£¥;(t)||. 
The  limiting  dissemination  radius  is  defined  as  1 1 F2(oo)  ||  =  lirn^oo  ||£(i)  ||. 

The  dissemination  radius  indicates  how  far  a  packet  can  reach  in  spatial  domain  in  a  large  network.  Next,  we 
move  on  to  the  temporal  domain.  Given  an  intuitive  definition  of  information  propagation  speed  is  ilFeAll ? 

which  tells  the  maximum  speed  in  direction  p.  However,  due  to  the  dynamics  in  cognitive  radio  networks,  a  node 
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closer  to  vq  may  receive  the  packet  later  than  a  farther  node.  For  example,  in  Figure  29,  V‘>  receives  the  packet 
later  than  v\ .  Instead  of  maximum  speed,  we  arc  more  interested  in  how  long  the  packet  can  be  disseminated  to 
a  chosen  destination  at  d  apart.  Thus,  the  Information  Propagation  Speed  is  defined  as  follows. 

Definition  8  (Information  propagation  speed  Sv(d)).  Let  tlf(d)  be  the  point  on  Cf  with  ||tv(d)||  =  d  ( see 
Figure  30).  Denote  T{v o,v)  =  arg  mint>o {v  e  V(£)}  as  the  topology -incurred  delay  of  the  node  v.  Denote 
v^d)  as  the  node  closest  to  tip(d)  which  can  receive  the  packet.  That  is,  vv{d)  =  argmin„eVoo  \\v  —  iv(d)||, 
where  Voo  =  lim/^oc  V(t).  When  ||£(oo)||  =  oo,  the  Information  Propagation  Speed  in  direction  ip  is  defined 
as  Slfi(d)  =  T(vo^(d)y  When  ||£(oc)||  <  oo,  Sv(d)  =  T(tnJy(d))  for  d  <  ||£(oo)||,  and  Sv(d)  =  0  for 
d  >  ||£(oo)||.  The  limiting  propagation  speed  5^(00)  is  defined  as  lim^oo  Sip(d).  The  definition  of  S(d) 
denotes  the  propagation  speed  in  an  arbitrary  direction. 


For  convenience  of  presentation,  we  have  the  following  terminologies,  which  differentiate  infinite  and  finite 
limits  of  information  dissemination. 

Definition  9.  For  a  packet  h  originated  by  vq  at  t  =  0,  b  is  said  to  be  disseminated  locally  if  ||£(oo)||  <  00 
and  globally  otherwise.  Particularly,  h  is  said  to  be  disseminated  globally  “instantaneously”  if  ||£(0)||  =  00, 
be  disseminated  globally  “within  finite  time”  if  ||£(0)||  <  00  but  ||£(0)||  =  00  for  some  bounded  f,  and  be 
disseminated  “gradually”  if  C{t)  <  00  for  any  t  but  £(00)  =  00. 


Based  on  our  models  and  definitions,  we  have  discovered  the  following  important  results  regarding  the  infor¬ 
mation  propagation  radius  and  speed. 


Theorem  15.  For  a  cognitive  radio  network  G[TL\ ,  Fr,  W(t)),  given  the  number  of  channels  m  and  the  spatial 
density  of  primary  nodes  Xp  =  {A  pk}™=i>  there  exists  a  critical  density  AC;C  on  secondary  nodes,  above  which 
G(H\,  Fr,  W(t))  remains  percolated  for  all  t  and  below  which  G(H\,Fr,W(t))  is  not  percolated  for  all  t. 
Specifically,  we  have 
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o<||e||<o.5  ||e||27?(l-^(2||e||))’ 
where  Pc  is  the  bond  open  probability  sufficient  for  percolation  on  a  dependent  triangular  lattice  and  Ps  = 
1  —  nr=i  (l  —  e-ApfcQ).  Furthermore,  almost  surely, 
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where  T  =  2irp  /Qr  x(l  —  Fr(x))PsdxdFr  and  Clcc  is  the  Link  CoiTelation  Coefficient. 


Theorem  16.  (i)  When  XCtW  <  X  <  XCyC  and  v$  €  C00(G(TT\,  1)),  information  b  originated  by  vq  can  be 
disseminated  gradually  at  some  constant  speed  Sp(d)  =  k,  for  sufficiently  large  d.  (ii)  When  X  >  Xc,c  and 
v0  G  C00(G(H\,  l))\Coc(G(H\,  Fr,  W (0))),  b  can  be  disseminated  globally  within  finite  time  with  probability 
1.  (iii)  When  X  <  AC)U„  b  can  only  be  disseminated  locally  with  probability  1  and  the  limiting  speed  5^(00)  =  0. 


Figure  31  shows  an  example  of  when  Xc,w  <  A  <  XC)W,  how  the  packet  b  originated  from  vq  disseminates, 
where  the  bigger  dots  connected  by  the  solid  line  denote  the  cluster  of  nodes  that  have  received  b  at  time  t. 
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Figure  31:  Snapshots  of  packets  disseminating  gradually  when  XCjW  <  A  <  AC5C. 
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Figure  32:  Dissemination  radius  and  propagation  speed  analysis. 


Particularly,  given  Ton  =  0.9s,  Taff  =  0.1s  and  A  =  230  (per  km2),  we  find  that  G(Ti\,  1)  is  percolated  but 
G(7i\,Fr,  VF(t))  not  by  simulation.  As  expected,  in  this  case,  only  a  very  small  set  of  nodes  will  receive  b 
initially,  as  shown  in  Figure  31(a).  However,  b  will  be  disseminated  to  more  and  more  nodes  gradually,  as  shown 
in  Figure  31(b)  and  Figure  31(c). 

For  comparison,  we  also  consider  an  ideal  dissemination  strategy,  which  has  been  well  studied  in  the  wireless 
sensor  networks,  by  assuming  that  once  a  secondary  node  receives  b,  it  stays  active  and  keeps  sending  b  using 
maximum  transmission  power,  until  all  nodes  within  its  maximum  transmission  range  receive  b.  Comparing  with 
Figure  31(c),  Figure  32(a)  shows  that,  with  ideal  dissemination  strategy,  the  information  disseminates  at  least 
two  times  faster. 

The  average  dissemination  radius  based  on  100  independent  simulations  with  parameters  A  =  230  (per  km2), 
m  =  10  and  \pi,-  =  10  (per  km2)  is  shown  in  Figure  32(b).  We  find  that  the  dissemination  radius  using  ideal 
dissemination  strategy  almost  scales  linearly  with  time.  An  interesting  observation  is  that  the  dissemination 
radius  without  using  ideal  dissemination  strategy  still  increases  linearly  with  t  for  any  r/,  although  the  speed  is 
much  slower.  We  also  note  that  the  dissemination  radius  is  an  increasing  function  of  //.  This  is  natural  since  larger 
//  means  more  nodes  in  communicating  state,  which  can  help  relay  the  packet  farther.  5  independent  simulations 
of  propagation  speed  S(d)  with  parameters  A  =  230  (per  km2),  m  =  10  and  A?)/,.  =  10  (per  km2)  are  shown 
in  Figure  32(c).  We  observe  that  although  the  practical  information  dissemination  in  cognitive  radio  networks 
is  much  slower  than  that  in  wireless  sensor  networks  with  ideal  dissemination  strategy,  the  propagation  speed 
S(d)  is  still  some  constant  for  large  transmission  distance  d.  This  validates  our  theoretical  results  in  Theorem  16. 
Similar  to  Figure  32(b),  we  also  observe  that  S(d)  can  be  increased  by  increasing  rj. 
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4.4.4  Importance  of  Inter- Operation  Networks 


One  of  our  original  research  objectives  is  to  design  recovery  strategies  for  fast  communication  restoration  after 
WMD  attacks.  As  the  recovery  network  is  limited  in  available  radio  resources,  we  should  maximally  exploit 
the  radio  utilization  in  the  recovery  network  in  order  to  deliver  the  large  amount  of  information  that  enters  the 
network  after  an  attack.  Our  work  on  the  cognitive  radio  networks  provides  the  fundamental  understanding  on 
the  possibility  of  utilizing  temporarily  unused  radio  channels  for  additional  information  delivery,  which  is  an 
important  and  promising  technique  to  achieve  the  full  network  utilization  in  the  recovery  network.  To  be  specific, 
our  work  has  contributed  to  our  original  research  goals  in  the  following  few  perspectives. 

•  We  have  constructed  a  network  model  that  captures  the  important  communication  features  in  cognitive 
radio  networks  in  respect  to  the  information  propagation  process. 

•  We  have  defined  the  novel  metrics  to  investigate  quantitatively  the  radius  and  speed  of  information  propa¬ 
gation  in  the  cognitive  radio  networks. 

•  We  have  derived  the  conditions  governing  information  propagation  capability  and  characterized  the  infor¬ 
mation  propagation  radius  and  speed  in  cognitive  radio  networks. 

4.5  Failure  Propagation  in  Inhomogeneous  Networks 

Our  previous  study  along  with  other  literature  on  performance  limits  in  cognitive  radio  networks  demonstrates 
the  feasibility  of  information  delivery  through  recycling  the  temporarily  unused  radio  resources  in  the  recovery 
network,  like  in  cognitive  radio  networks  (CRNs)k.  Since  the  recovery  network  is  limited  in  the  availability  of 
radio  resources,  the  cognitive  radio  technology  is  important  to  supplement  the  scheduling  schemes  to  make  the 
full  network  utilization.  Therefore,  an  interesting,  yet  challenging  question  is  that  how  fast  can  failures  propagate 
in  such  an  inhomogeneous  networks.  There  are  three  main  challenges  in  addressing  this  question.  First  we  must 
consider  that  there  are  two  types  of  users,  the  primary  users  and  secondary  users,  in  which  primary  users  who  have 
license  from  the  regulator  and  thus  have  priority  to  utilize  spectrum,  and  secondary  users  who  opportunistically 
access  spectrum  without  interfering  with  the  coexisting  primary  users.  This  demands  for  a  new  study  in  an 
inhomogeneous  network  instead  of  homogeneous  networks  in  the  past.  Second,  we  consider  finite  as  well  as  large 
CRNs,  where  secondary  users  are  mobile  under  general  mobility  and  primary  users  arc  either  mobile  or  static.  In 
other  words,  we  need  to  provide  a  general  mobility  framework  which  captures  most  characteristics  of  the  existing 
mobility  models  and  takes  spatial  heterogeneity  into  account.  Third,  we  must  consider  both  “finite”  and  “large” 
networks,”  while  most  of  existing  works  focus  on  asymptotic  results  for  large  or  infinite  large  networks. 

4.5.1  Objectives  and  Approaches 

By  reviewing  existing  studies  on  CRN  performance,  we  find  that  the  seminary  in[52,  54]  studied  the  packet 
latency  in  the  fully  connected  wireless  ad-hoc  networks  and  showed  that  there  exist  bounds  on  the  latency  and 
these  bounds  arc  tight  when  the  number  of  nodes  arc  large  enough.  Instead  of  full  connectivity,  the  work  in 
[55,  56]  further  showed  that  the  latency  scales  asymptotically  at  least  linearly  with  the  transmission  distance  in 
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wireless  sensor  networks  when  these  networks  are  percolated.  Although  these  results  have  greatly  advanced  our 
understanding  of  the  nature  of  latency,  they  may  not  be  applicable  to  CRNs  for  the  following  challenges. 

First,  these  results  were  obtained  by  assuming  that  wireless  nodes  arc  static.  However,  in  many  cognitive 
applications  (e.g.,  cognitive  vehicular-  networks,  military  networks),  secondary  users  usually  need  to  move  around 
to  achieve  “better  spectrum  opportunities”  or  “more  security”.  In  recent  years,  a  significant  effort  has  been 
devoted  by  the  research  community  to  understand  how  the  mobility  influences  the  performance  of  wireless  ad 
hoc  networks  or  sensor  networks.  In  the  seminal  work  [57],  Grossglauser  and  Tse  showed  that  mobility  can 
improve  the  capacity  in  large  wireless  ad  hoc  networks  at  the  cost  of  the  delay.  The  mobility  pattern  considered 
by  them  assumes  that  nodes  move  according  to  an  ergodic  process  that  is  equally  likely  to  visit  any  portion  of  the 
network  area.  Motivated  by  [57],  capacity-delay  trade-offs  have  been  extensively  studied  under  various  mobility 
models,  such  as  under  the  i.i.d  model  [58],  the  Browian  motion  [59],  the  reshuffling  model  [60]  and  different 
valiants  of  random  walks  and  random  way -point  [61,  62].  In  all  these  studies,  nodes  are  assumed  to  be  spatially 
homogeneous.  That  is,  the  motion  of  a  node  uniformly  covers  the  entire  network.  Later  on,  spatial  inhomogeneity 
has  been  taken  into  account.  In  [63],  the  network  is  partitioned  into  cells  and  nodes  are  restricted  to  move  within 
on  randomly  chosen  cell.  The  work  in  [64]  studied  the  capacity  under  mobility  where  each  node  has  a  home  point 
and  the  nodes  move  around  their  own  home  points.  These  works  demonstrated  that  mobility  plays  an  important 
role  in  networks’  performance,  but  the  question  about  how  the  general  mobility  ( instead  of  a  specific  mobility 
model )  affects  the  performance,  or  especially  the  latency  of  wireless  networks  remains  open. 

Furthermore,  the  works  in  [65,  54,  55,  56]  only  derive  the  latency  of  networks  where  the  number  of  nodes 
is  infinite  or  approaches  to  infinity.  Although  these  results  enable  the  deployment  of  the  large  general  purpose 
ad  hoc  or  sensor  networks,  in  many  real  applications,  the  number  of  nodes  is  small  and  finite.  The  question 
of  Wliat  the  latency  is  in  networks  with  finite  nodes  leaves  unanswered.  Moreover,  the  network  topologies  in 
[65,  54,  55,  56]  are  homogeneous,  i.e.,  nodes  of  these  networks  are  identical.  Nevertheless,  there  exist  two  types 
of  nodes  in  CRNs  and  cognitive  (secondary)  communications  are  subject  to  primary  communications.  Wliat  the 
impact  of  the  primary  communications  on  the  latency  of  secondary  users  is  stay  under-explored.  We  remark  that 
we  are  not  the  first  to  study  the  heterogeneous  topology  of  CRNs.  For  example,  in  [66],  Ren  et  al.  studied  the 
challenge  due  to  the  heterogeneous  nodes  on  connectivity  of  CRNs.  However,  to  the  best  of  my  knowledge,  there 
is  little  work  on  the  impact  of  topological  heterogeneity  on  the  latency  of  CRNs. 

Therefore,  we  will  study  the  latency  in  general  mobile  CRNs.  Particularly,  we  first  define  an  abstract  frame¬ 
work  which  captures  most  of  the  features  of  the  existing  mobility  models  and  takes  spatial  inhomogeneities  into 
account.  Then  we  study  the  latency  of  a  CRN  where  secondary  users  are  mobile  under  this  general  framework 
(and  primary  users  may  be  either  mobile  or  static).  We  show  that  in  finite  CRNs,  the  dissemination  latency  de¬ 
pends  on  the  spatial  distribution  and  the  mobility  capability  a  (characterizing  the  region  that  a  mobile  user  can 
reach)  of  secondary  users.  And  given  any  spatial  distribution  of  secondary  users,  there  exists  a  critical  value 
on  a,  above  which  the  latency  has  a  heavy  tail;  and  below  which  the  tail  of  its  distribution  is  bounded  by  some 
Gamma  distributions.  In  addition,  as  the  network  grows  to  infinity,  the  latency  asymptotically  scales  linearly  with 
respect  to  the  “distance”  (characterized  by  the  transmission  hops  or  Euclidean  distance)  between  the  source  and 
destination  nodes  if  the  network  remains  “connected”  (fully  connected  or  percolated).  Moreover,  we  further  find 
that  although  the  primary  traffic  will  negatively  impact  the  expected  latency,  it  will  not  influence  the  “threshold” 
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structure  (on  a)of  the  distribution  of  the  latency  in  finite  CRNs  and  the  “linearity”  of  the  asymptotic  latency  (with 
respect  to  the  dissemination  “distance”)  in  large  “connected”  CRNs. 

4.5.2  Main  Results  of  Spatial  and  Temporal  Limits 

Our  contributions  on  this  works  arc  three-fold:  a  general  mobility  model,  latency  for  finite  inhomogeneous 
networks,  and  large,  percolated  networks. 

Mobility  models:  We  consider  a  family  of  mobility  models  which  arc  denoted  by  A4  (<Ik  \k,  a)  and  characterized 
by  three  parameters  <[>,  'k,  a.  Spatial  heterogeneity  has  been  taken  into  account  in  A4(<1>.  *k,  a).  Specifically, 
we  first  study  spatial  inhomogeneity  of  the  trajectory  of  a  particular-  mobile  node.  We  consider  the  scenario  that 
a  node  spends  most  of  the  time  in  a  small  region,  and  rarely  visits  the  areas  far  away  from  it.  We  model  this 
behavior  by  assuming  that  each  node  Vi  has  a  home  point  [64],  located  at  c  -h  Nodes  move  “around”  their  home 
points  according  to  independent  stationary  and  ergodic  processes.  Moreover,  we  describe  the  probability  density 
of  a  node  vt  around  by  a  non-increasing  and  direction-invariant  function  'kj(x)  =  \k(x  —  v-’).  We  assume 
that  \ki  is  non-zero  in  and  only  in  a  region  characterized  by  a  constant  ct;  that  is,  'kj(x)  =  \k(x  —  v-1 )  >  0  when 
|  x  —  c  ■’  |  <  a  and  'kj(x)  =  \k(x  —  vj'  )  =  0  otherwise,  a  is  called  mobility  capability  since  2 a  characterizes  the 
moving  diameter  of  nodes. 

To  further  account  for  spatial  inhomogeneity  of  the  home  points  over  Q„  ,  we  assume  that  each  home  point 
is  associated  with  a  fixed  point  vf  which  is  called  the  center  point  of  vt.  The  center  points  are  regularly  placed  in 
Uln.  For  example,  {nj, . . . ,  vf}  are  placed  regularly  at  positions  ( ^ )  with  0  <  i  <  y/n  —  1  and 
0  <  j  <  y/n  —  1  (see  Figure  33).  We  describe  the  distribution  of  the  home  point  u-'  around  v°  by  a  non-increasing 
probability  density  function  <kj(x)  =  <k(x  —  vf),  which  is  assumed  to  be  invariant  in  all  directions. 

Discussion:  “Home  points”  have  been  introduced  in  [64]  to  describe  the  spatial  inhomogeneity  incurred  by 

the  mobility  of  a  particular  wireless  node.  In  A4(<1>.  \k,  a),  besides  the  “home  points”,  we  further  introduce 
the  “center  points”  to  model  the  heterogeneously  spatial  distribution  of  the  home  points,  which  characterizes 
the  spatial  inhomogeneity  incurred  by  heterogeneous  mobility  of  variant  users.  This  two-level  mobility  model 
accounts  for  a  wide  range  of  mobility  patterns.  For  example,  if  the  probability  density  function  <k(x)  is  a  constant 
function  independent  of  x  (i.e.,  home  points  are  uniformly  distributed  over  Qn),  _Ad(<k,  \k,  a)  reduces  to  Uniform 
Anisotropic  model  in  [64].  Furthermore,  if  the  probability  density  function  \kj(x)  =  'k(x  —  ’iff)  =  6(x  —  i4'j, 
where  6(x)  is  the  Dirac  impulse  function,  A4  (<Ik  T.  o)  reduces  to  the  static  model  in  [67],  where  nodes  are 
assumed  to  be  static  and  uniformly  distributed;  if  'k(x)  is  also  a  constant  function  independent  of  x  and  a, 
.A/l(<k,  *k,  a)  reduces  to  the  homogeneous  mobility  model  in  [57];  and  if  \k(x)  is  a  threshold  function  whose 
value  is  zero  when  x  >  a  and  a  nonzero  constant  when  x  <  a,  Af(<k,  T,  a)  reduces  to  the  constrained  i.i.d 
model  used  in  [56]. 

In  this  work,  we  assume  that  secondary  users  are  mobile  under  the  general  mobility  pattern  Ai(<k,  vk,  a). 
To  facilitate  the  study  of  the  dissemination  latency  of  secondary  users,  we  further  categorize  _A/l(<k,  'k,  a)  into 
three  classes  based  on  the  extent  of  spatial  inhomogeneity  of  home  points: 

•  Extremely  Inhomogeneous  Home  Points  (EIHP)  mobility  ARTr;,  vb,  a):  Home  points  are  fixed  and  reg- 
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Figure  33:  An  illustration  of  the  general  mobility  pattern  Af  (<F,  'F,  a). 
ularly  placed  over  Qn.  Here  <Fe(x)  =  <5(x). 

•  Partial  Inhomogeneous  Home  Points  (PIHP)  mobility  A4  (‘bp,  VP,  a):  As  shown  in  Figure  33,  center  points 
{vi  Yi=\  partitioned  Qr,  into  n  subregions  {C>(}"=]  as  Voronoi  diagrams  (we  generally  assume  that  n  is  a 
square  of  some  integer  for  simplicity).  In  this  category,  the  home  point  is  independently  distributed  in 
Oi.  The  clustered  grid  mobility  in  [64]  falls  into  this  category. 

•  Homogeneous  Home  Points  (HHP)  mobility  J\A (<!>//,  VP,  a):  Home  points  {v^}f=1  are  independently  and 
uniformly  distributed  over  Hn.  Here  <l>  rj(x)  is  a  constant  density  function  independent  of  x. 

We  need  to  remark  here  that  between  the  extremely  inhomogeneous  mobility  EIHP  and  homogeneous  mo¬ 
bility  HHP,  there  still  exist  some  other  partially  inhomogeneous  mobility  patterns,  besides  PIHP  considered 
in  this  paper.  However,  the  analysis  techniques  and  results  for  PIHP  can  be  easily  extended  to  other  partially 
inhomogeneous  mobility  patterns.  Thus  in  this  study,  we  only  consider  PIHP  case. 

Notation  and  Problem  Formulation  We  first  denote  Af(<I>,  \F,  a),  M.h)  as  a  mobile  CRN  J~m.n  where 

secondary  users  are  mobile  under  A4(<F,  \F,  a)  and  primary  users  are  mobile  under  M.h  throughout  the  paper. 
Denote  L(t)  as  the  set  of  communication  links  among  secondary  users  in  AYfF,  \F,  a).  Mh)  at  time  t. 

The  interference  model  [68]  shows  that  L (i)  are  dynamic  as  the  primary  and  secondary  users  are  mobile. 

As  our  main  interest  lies  in  the  dissemination  latency,  i.e.,  how  fast  information  can  be  disseminated  from  the 
source  secondary  user  to  the  destination  secondary  user,  rebroadcasting  and  “store-carry-and-forward”  communi¬ 
cation  paradigm  (also  named  mobility-assisted  routing)  have  been  considered.  Specifically,  without  considering 
the  propagation  delay,  when  the  source  vs  broadcasts  a  message  at  time  0,  all  the  secondary  users  connected  to  vs 
(i.e.,  secondary  users  which  have  a  communication  path  to  vs  in  L(0))  receive  the  message  instantaneously.  The 
destination  Vd  may  not  receive  this  message  if  it  is  disconnected  from  vs  at  time  0.  As  time  goes  on,  nodes  move, 
and  the  message  is  passed  from  message-carrying  secondary  users  to  other  secondary  users  whenever  they  are 
connected  in  L(t).  Thus,  the  message  is  disseminated  throughout  the  network  and  v,j  may  receive  this  message 
at  some  time  as  this  process  continues.  Before  the  problem  formulation,  we  define  a  few  relevant  concepts. 
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Definition  10.  Let  l^j  denote  a  communication  link  between  secondary  users  Vi  and  Vj.  The  first  hitting  time 
between  V{  and  Vj  is  defined  as  7 h(vi,  Vj)  =  inf{t  >  0  :  lij  G  L(f)}. 

Definition  11.  Let  V[t)  be  the  set  of  secondary  users  that  have  received  the  message  at  time  t.  The  dissemination 
latency  between  vs  and  Vd  is  defined  as  Td  =  inf{t  >  0  :  Vd  G  V(f)}. 

Td  can  be  coupled  as  the  first  passage  time  in  the  weighted  graph  [55].  Next  we  formulate  the  latency  problem 
addressed  in  this  paper  as  follows. 

Definition  12.  Dissemination  Latency  Td.  Given  (fFm,m  At  ( T.  dG  a-).  Mu)  with  a  source  secondary  user  vs 
disseminating  a  message  to  the  destination  Vd  at  time  0,  find  out: 

1.  In  a  finite  Tm^n,  what  the  distribution  of  the  dissemination  latency  Td  is; 

2.  as  the  number  of  users  m  and  n  increases  to  infinity,  the  dissemination  latency  Td  is  scalable  or  not. 

To  further  describe  “how  fast”  information  can  be  disseminated,  we  usually  scale  the  dissemination  latency 
Td  with  the  “distance”  between  the  source  and  destination  secondary  users. 

Definition  13.  Distance  T>.  In  (Tm,n,  At(<&,  'b,  a),  Mh),  three  metrics  can  be  used  to  characterize  how  far 
two  nodes  Vi  and  Vj  are  apart:  the  “distance”  between  secondary  users  V{  and  Vj  at  time  t  d^\vi,Vj),  the 
“distance”  between  vf  and  Vj  dh{vi,  Vj ),  and  the  “distance”  between  v\  and  Vj  dc(vi,  Vj ).  Here  the  “distance" 
can  be  any  p-norm  metric  function  and  we  consider  two  of  the  most  popular  metrics:  “transmission  hops”  and 

^  ^7" 

Euclidean  distance.  Denote  V  as  the  “distance"  between  vs  and  Vd  and  define  Sd  =  fy.  Sd  characterizes 
how  fast  information  disseminates  and  is  called  “dissemination  speed”  in  this  paper  for  convenience.  V  will  be 
specified  in  the  particular  analysis. 

The  key  question  in  this  study  is  how  fast  information  is  disseminated  in  both  finite  and  large  CRNs  under 
general  mobility  At  ( <b.  \k,  a).  To  facilitate  our  analysis,  we  first  study  the  dissemination  latency  T,i  in  CRNs 
where  secondary  users  are  mobile  under  the  three  subclasses  of  models  EIHP,  PIHP  and  HHP,  respectively. 
Then  based  on  the  generalization  of  these  results,  we  obtain  the  fundamental  properties  of  the  dissemination 
latency  7)/  when  secondary  users  are  mobile  under  the  general  mobility  At  (  (Ig  dG  a).  We  summarize  our  main 
results  as  follows. 

Theorem  17.  In  a  finite  CRN  (T,n/n.  At  ($,  dG  cr),  At//),  there  exists  a  critical  value  on  the  mobility  capability 
a,  above  which  the  tail  of  dissemination  latency  Td  is  bounded  by  some  Gamma  distribution;  below  which  Tl  has 
a  heavy-tailed  distribution  and  P(7^  =  oo)  >  0. 

Remark  13.  P(7/  =  oo)  >  0  indicates  a  positive  probability  that  the  destination  will  not  receive  the  message 
from  the  source.  Thus  the  requirement  P(  7);  <  oo)  =  1  in  the  mobile  wireless  networks  is  equivalent  to  the 
connectivity  in  the  wired  networks,  which  is  used  as  a  prerequisite  to  evaluate  the  functionality  of  the  network 
applications.  Moreover,  a  heavy  tail  of  the  dissemination  latency  Td  implies  a  significant  probability  that  it  takes 
long  time  to  disseminate  a  message  from  the  source  to  the  destination.  Thus  not  only  a  bounded  dissemination 
latency  ( i. e. ,  P(7^  <  oo)  =  1),  a  light-tailed  distributed  dissemination  latency  7)/  (i.e.,  EifTf)  <  oo)  is  required 
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for  time-critical  applications.  Therefore,  a  light-tailed  distribution  of  T:  is  assumed  or  required  in  many  deploy¬ 
ments  and  performance  studies  of  wireless  networks  in  the  literature.  For  example,  the  authors  in  [57]  implicitly 
assume  that  the  dissemination  latency  is  exponentially  bounded  (light-tailed)  so  as  to  make  their  delay-capacity 
tradeoff  analysis  tractable. 

Theorem  17  reveals  that  to  achieve  a  light-tailed  dissemination  latency  ( note  that  Gamma  distribution  is  a 
type  of  light-tailed  distribution),  the  mobility  capability  of  secondary  users  a  need  to  be  larger  than  some  critical 
value,  which  will  be  specifically  identified  in  Propositions  for  EIHP,  PIHP  and  HHP  mobility,  respectively  [68]. 
This  result  encourages  and  validates  the  existing  endeavor  of  deploying  CRN  for  practical  applications,  including 
time-critical  applications,  such  as  emergency  networks  and  military  networks.  In  addition,  the  result  in  Theorem 
17  also  motivates  further  performance  studies  of  CRN s,  for  example,  the  delay-capacity  tradeoff  study. 

We  must  emphasize  that  the  goal  of  this  study  is  to  investigate  the  fundamental  properties  of  the  dissemination 
latency  Td  in  CRNs  where  secondary  users  are  mobile  according  to  the  general  mobility  patterns.  However,  if 
given  more  knowledge  of  the  CRN,  e.g.,the  specific  mobility  patterns,  the  same  proof  can  also  be  used  to  derive 
the  more  specific  distributions  of  Ti 

As  the  network  size  of  the  CRN  increases,  we  next  have  the  following  theorem  on  the  availability  of  the 
dissemination  latency  T]. 

Theorem  18.  We  consider  two  types  of  connectivity  in  large  CRNs:  full  connectivity  and  percolation-based 
connectivity.  The  former  is  that  there  exists  a  communication  path  between  any  two  nodes;  and  the  latter  is 
that  there  exist  a  large  component  well  scattered  over  the  entire  network.  CRN  {fFm,ni  Af(<l>,  \k,  there 

exists  a  finite  constant  n  such  that  P(limx>^oo  Sd  =  limx>— xx,  =  k)  =  1. 

Remark  14.  Scalability  has  been  the  most  fundamental  concern  that  has  so  far  discouraged  the  deployment 
of  large  wireless  networks.  Among  the  several  scalability  issues,  perhaps  the  most  basic  one  is  that  related  to 
the  dissemination  latency.  Theorem  18  demonstrates  that  in  large  connected  CRNs,  the  dissemination  latency 
Td  asymptotically  scales  linearly  with  the  initial  “distance”  between  the  source  and  destination,  /.<?.,  the  mes¬ 
sage  sent  by  a  source  reaches  its  destination  at  a  fixed  asymptotic  speed.  This  result  enables  and  verifies  the 
deployment  of  CRNs  for  large  applications,  such  as  sensor  networks. 

This  paper  aims  to  understand  the  fundamental  properties  of  the  dissemination  latency  Td  in  CRNs  under 
general  mobility.  However,  besides  the  theoretical  importance  of  our  findings,  our  results  can  be  used  practically 
not  only  in  the  initial  deployment  of  a  CRN,  but  also  in  evaluating  the  performance  of  specific  CRN  applications. 
For  example,  in  a  large  deployment  of  a  mobile  CRN  as  a  wireless  sensor  network,  the  result  in  Theorem  18  can 
be  used  to  estimate  the  delay  elapsed  between  the  time  at  which  an  incoming  event  is  sensed  by  some  node  of  the 
network  and  the  time  at  which  this  information  is  retrieved  by  the  data  collecting  sink.  In  the  next  two  sections, 
we  will  present  the  proof  for  Theorem  17  and  Theorem  18,  which  studies  the  distribution  and  scalability  of  the 
dissemination  latency  Td  in  finite  and  large  CRNs  under  EIHP,  PIHP  and  HHP  mobility,  respectively. 

The  Scalability  of  Td  in  Large  CRN 

We  have  studied  the  distribution  of  the  dissemination  latency  Ti  in  finite  mobile  CRNs  and  proved  Theorem 
17.  We  next  prove  Theorem  18,  which  studies  the  scalability  of  the  dissemination  latency  Td  in  large  mobile 
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CRNs  and  states  that  T(i  asymptotically  scales  linearly  with  the  “distance”  between  the  source  and  destination 
nodes.  Particularly,  our  results  demonstrate  that  as  the  network  size  (n  and  m)  increases  to  oo,  TiT)  — >  oo  even 
with  large  mobility  capability  a.  Thus  in  large  CRNs  (i.e.,  the  network  size  is  large  enough),  the  distribution  of 
T,j  cannot  be  used  to  measure  how  fast  information  is  disseminated.  Therefore,  we  will  investigate,  instead  of 
the  distribution,  the  scalability  of  Ti  in  large  CRNs.  Specifically,  we  will  study  the  scaling  behavior  of  Ti  with 

respect  to  the  “distance”  V  between  the  source  vs  and  destination  Vd,  which  can  be  characterized  by  the  “speed” 

C  —  Id 
C>d  ~  x>  ■ 

Based  on  the  distribution  study  of  Td  explained  earlier,  we  have  the  implication  that  the  tail  of  S,j  may 
“disappear-”  as  the  network  size  increases.  We  rigorously  study  in  this  section  the  scalability  of  Td,  which  validate 
our  implication  (see  Theorem  18).  We  must  emphasize  that  our  derivation  is  based  on  the  assumption  that  the 
number  of  nodes  n  and  m  are  finite.  However,  n  and  m  may  approach  to  oo  in  large  CRNs.  Therefore,  the 
techniques  and  results  in  finite  CRNs  may  not  be  applied  to  large  CRNs  directly.  Instead,  we  will  use  large 
number  theory  to  demonstrate  that  Td  asymptotically  scales  linearly  with  V,  i.e.,  lim£>-»oo  Sd  is  convergent  to 
some  positive  constant.  Specifically,  the  main  tool  used  is  Liggett’ s  subadditive  ergodic  theorem,  [69], 

Note  that  Liggett’s  theorem  provides  a  method  to  study  the  limiting  behavior  of  a  large  random  process.  We 
next  show  how  to  use  Liggett’s  theorem  to  study  the  limit  of  the  dissemination  speed  Sd-  Before  we  proceed, 
we  need  first  to  make  some  clarification  about  our  model  (Tm^n,  _M(<1>,  T,  a).  A4  h)  for  large  CRNs.  To  study 
the  dissemination  latency  Ti  and  speed  S,i  in  large  CRNs,  we  progressively  increase  the  number  of  secondary 
and  primary  users  n  and  m  in  Note  that  as  rn  and  n  increases,  the  homogeneously  distributed  primary  users 
are  asymptotically  distributed  as  a  two-dimensional  Poisson  point  process  with  density  Xp  =  Xn/m.  Assume 
that  A p  is  a  constant  for  simplicity  (i.e.,  m  and  n  increases  proportionally).  Another  implicit  assumption  is  that 
A p  is  not  large  thus  there  exist  enough  spectrum  opportunities  for  secondary  communication.  This  assumption 
is  reasonable  since  the  low  spectrum  utilization  of  primary  users  is  the  motivation  for  CRNs.  And  similarly,  to 
facilitate  the  analysis,  we  have  proved  Theorem  18  under  three  subclasses  of  mobility  EIHP,  PIHP  and  HHP 
respectively. 

4.5.3  Simulation  Results 

In  this  section  we  provide  simulation  results  to  support  our  theoretical  analysis  on  distribution  and  scalability  of 
latency  in  finite  and  infinite  CRNs,  respectively.  In  these  simulations,  time  is  partitioned  into  unit  slots  and  in 
each  time  slot,  primary  users  are  uniformly  distributed  at  random  within  the  network  area  and  secondary  users 
are  uniformly  distributed  around  their  home  points  (i.e.,  T  is  uniform).  Furthermore,  home  points  are  uniformly 
distributed  around  the  center  points  under  PIHP  mobility  (i.e.,  <hp  is  uniform).  The  transmission  range  r  of 
secondary  users  and  the  interference  range  If  of  primary  users  are  set  as  r  =  0.1  kilometer  (km)  and  If  =  0.3 
(km),  respectively.  Secondary  users  opportunistically  access  m  =  2  channels. 

We  first  study  a  finite  CRN  where  n  =  16  secondary  users  are  mobile  within  an  2  x  2  (km2)  area  (i.e., 
A  =  4  per  km2).  Figure  35  illustrates  the  complementary  distribution  (CCDF)  of  the  dissemination  latency 
P(  T,i  >  t)  on  a  log-log  scale  for  EIHP,  PIHP  and  HHP  models  with  different  values  of  the  mobility  radius  a  and 
the  spatial  density  Xp  of  primary  users.  The  probability  is  calculated  based  on  the  average  of  1000  independent 


73 


(a)  Log-Log  scale 


(b)  Log-Log  scale 


(c)  Log-Log  scale 


(d)  Log-Log  scale 


Figure  34:  CCDF  of  the  first  hitting  time  Th(vi.  Vj)  between  neighboring  secondary  users  vt  and  Vj. 


simulations.  It  is  observed  in  Figure  35  that  as  Ap  increases,  the  curves  move  right-ward,  which  indicates  the 
increasing  expected  dissemination  latency.  Flowever,  regardless  of  the  value  of  Ap,  when  a  =  0.4  {km),  which 
is  larger  than  the  cutoff  point  under  EIHP  but  smaller  than  those  under  PIHP  and  F1HP,  the  dissemination  latency 
T(j  has  a  light  tail  under  EIF1P  but  heavy  tails  under  PIHP  and  HHP  As  a  increases  to  0.6  (km),  which  is  larger 
than  the  cutoff  point  in  PIHP,  but  still  less  than  that  in  HHP,  the  heavy  tail  of  %]  in  PIHP  disappears,  but  T,i  in 
HHP  presents  a  heavy  tail.  These  results  are  in  good  agreement  with  our  theoretical  analysis. 

We  further  perform  a  series  of  simulations  to  validate  our  asymptotic  results  in  large  networks.  Figure  36(a) 
and  36(b)  show  the  latency  scalability  in  large  CRNs  under  EIHP  and  PIHP  models,  respectively,  where  the 
spatial  density  of  secondary  users  is  A  =  4  (per  km?).  As  shown  in  36(a)  and  36(b),  no  matter  how  large  the 
mobility  radius  a  is,  the  dissemination  latency  7)/  scales  linearly  with  the  dissemination  distance  V  (Manhattan 
distance).  Moreover,  The  latency  scalability  in  a  large  percolated  CRN  under  HHP  mobility,  where  the  spatial 
density  of  secondary  users  is  set  as  A  =  200  (per  km2)  to  ensure  percolation,  which  shows  that  in  percolated 
CRNs,  the  dissemination  latency  T,j  scales  linearly  with  the  dissemination  distance  V  (Euclidean  distance)  as  V 
increases.  In  addition,  as  shown  in  Figure  36,  the  scalability  decreases  as  the  spatial  density  Ap  increases.  These 
observations  provide  a  straightforward  illustration  of  Propositions  [68]. 


4.5.4  Relevance  to  Original  Goals 

In  summary,  we  study  the  distribution  of  the  information  dissemination  latency  7)/  in  finite  CRNs  and  the  scala¬ 
bility  of  %j  in  large  CRNs  under  general  mobility.  We  found  that  in  finite  networks,  there  exists  a  cutoff  point 
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The  dissemination  latency:  a  =  0.4,  m  =  4 


(a)  Log-Log  scale 


The  dissemination  latency:  a  =  0.8,  m  =  4 


(b)  Log-Log  scale 


The  dissemination  latency:  a  =  0.4,  m  =  8  The  dissemination  latency:  a  =  0.8,  m  =  8 

(c)  Log-Log  scale  (d)  Log-Log  scale 


Figure  35:  CCDF  of  the  dissemination  latency  7)/  under  general  mobility  A4(&h,  vF,  ct). 


(a)  EIHP  (b)  PIHP 

Figure  36:  The  dissemination  speed  S&  =  (s/km)  for  5  independent  simulations  with  parameters  A  =  4  and 
A p  =  0.5,  respectively. 


on  the  mobility  radius  a  of  secondary  users,  above  which  the  tail  distribution  of  T,/  is  bounded  by  some  Gamma 
distribution  and  below  which  7);  has  a  heavy-tailed  distribution.  When  networks  become  large,  the  dissemination 
latency  %i  is  (linearly)  scalable  with  respect  to  the  dissemination  distance.  Our  results  demonstrate  that  when 
secondary  users  can  move  in  a  large  region,  a  Gamma  distributed  (light-tailed)  latency  in  finite  networks,  or 
a  scalable  latency  in  large  networks,  is  achievable,  which  encourages  the  deployment  of  CRNs  for  immediate 
communications  in  the  aftermath  of  WMD  attacks,  which  implies  that  inter-operation  among  different  networks 
would  make  a  big  difference  in  delivery  messages  and  large-scale  applications. 
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5  Dection,  Localization,  and  Tracking  of  Systematic  Failures 


Our  primary  methodology  is  to  use  algebraic  geometry  to  identify  failures  in  terms  of  coverage  holes.  While  it  has 
a  wider  application  scope,  we  have  focused  in  this  work,  on  sensor  networks,  for  both  the  sake  of  concreteness,  as 
well  as  the  for  the  generic  properties  of  sensor  networks  which  make  them  very  suitable  for  such  basic  research 
into  the  impact  of  weapons  of  mass  destruction  on  networks.  Sensor  networks  are  an  important  class  of  distributed 
and  pervasive  systems,  with  applications  in  areas  including  environmental  monitoring,  health  care  and  military 
operations  [70].  A  unifying  theme  of  many  of  these  problems  is  to  glean  consensus  information  by  systematically 
combining  the  data  collected  at  individual  nodes,  in  accordance  to  the  structure  of  the  network.  The  consensus 
information  thus  obtained  characterizes  the  network,  or  the  data  in  the  network  as  a  whole,  and  better  represents 
the  underlying  phenomenon  than  may  be  inferred  from  the  data  at  individual  nodes.  This  reveals  the  fundamental 
nature  of  sensor  networks,  where  global  patterns  emerge  from  simple  interactions  between  nodes. 

In  order  to  remain  robust  to  unprecedented  scenarios  and  limitations  of  resources,  a  likely  situation  in  the 
event  of  a  WMD  attack,  we  arc  motivated  to  rigorously  seek,  the  minimal  required  information,  and  tools  needed 
to  process  this  information,  so  as  to  perform  a  certain  task.  Owing  to  limited  power  and  communication  capabil¬ 
ities,  and  for  robustness,  we  arc  motivated  to  develop  distributed  algorithms.  The  information  which  is  readily 
available  in  sensor  networks,  is  for  each  node  to  know  its  neighboring  nodes.  Two  nodes  arc  neighboring  nodes 
if  they  can  communicate  with  each  other.  This  is  equivalent  to  having  a  distributed  representation  of  the  commu¬ 
nication  graph.  With  limited  communication  between  the  nodes,  we  may  also  obtain  the  higher  order  cliques  in 
the  graph,  and  the  sub-cliques  through  which  they  connect  to  other  cliques. 

We  observe  that  many  tasks  in  sensor  networks  may  alternatively  be  stated  in  topological  terms.  The  tasks 
of  detection  and  localization  of  coverage  holes  and  worm  holes  arc  two  such  examples.  Coverage  holes  may 
also  be  caused  due  to  a  large  localized  failure  in  the  network,  such  as  that  caused  by  a  WMD.  The  combinatorial 
information  mentioned  above  is  sufficient  to  compute  topological  invariants,  and  is  the  subject  of  Algebraic 
Topology  [71].  We  employ  this  theory  to  develop  distributed  algorithms  to  detect  and  localize  coverage  and 
worm  holes.  We  emphasize  that  this  is  the  first  work  which  simultaneously  solves  both  problems,  supporting  our 
thesis  that  algebraic  topology  offers  a  general  framework  for  topological  analysis  in  sensor  networks,  and  other 
applications  in  general  networks. 

Perhaps  the  more  likely  scenario  produced  by  a  WMD  is  that  of  a  dynamically  varying  network  produced 
by  spatially  and  temporally  correlated,  for  potential  future  mitigation  of  systematic  failures.  In  this  context,  we 
investigated  the  problem  of  tracking  such  systematic  failures,  a  process  which  led  us  to  unify  several  existing 
definitions  of  network  boundary  in  the  literature  into  a  single  mathematical  framework,  and  develop  efficient 
algorithms  to  compute  and  update  network  boundaries.  Tracking  such  failures  provides  information  about  the 
nature  of  a  failure,  helps  in  estimating  its  impact  in  the  future  and  develop  counter-measures. 

It  is  often  the  case  that  surveillance  of  a  region  is  performed  using  mobile  agents/robots/UAVs,  etc.,  (which 
we  will  model  as  nodes  for  simplicity).  In  such  cases,  analyzing  the  network  formed  by  these  mobile  nodes 
provides  us  with  valuable  information  which  the  nodes  do  not,  on  their  own.  However,  the  mobile  nature  of  the 
network  imposes  difficult  challenges  in  its  analysis.  We  showed  that  the  recently  developed  theory  on  zig-zag 
persistent  homology  [16]  can  also  be  effectively  used  to  analyze  mobile  networks,  and  efficiently  track  which 
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lack  coverage,  again  without  the  need  for  any  co-ordinate  information.  These  strategies  also  help  analyze  and 
compare  mobility  patterns  which  was  not  possible  with  previously  available  techniques. 

5.1  Localization  of  Failures  in  Large-Scale  Networks:  A  Manifold  Learning  Approach  in  Data 
Space 

To  assess  the  health  of  a  network,  we  may  have  access  to  a  variety  of  measurements  at  the  individual  nodes.  The 
characterization  of  these  measurements  yields  a  variety  of  information,  which  include  that  about  the  functionality 
of  the  network.  To  cope  with  an  onset  or  the  aftermath  of  a  significant  attack  in  the  likes  of  one  by  weapons  of 
mass  disruption/destruction,  our  focus  primarily  lies  in  detecting  early  failures  and  in  localizing  them  to  infer  the 
degree  to  which  the  topology  of  a  network  is  preserved.  The  power  of  a  detection  strategy  lies  in  its  versatility 
and  its  adaptability  to  the  data  {aq}j=1 ...  ( N  is  the  number  of  nodes)  which  may  be  available  at  a  given  point 

in  time.  The  rationale  of  our  proposed  technique  is  based  on  the  fact  that  normal  measurements  of  a  network 
lie  on  some  manifold  to  be  learned,  and  any  interruption/deviation  thereof  is  an  indication  of  a  failure.  In  what 
follows  we  describe  our  technique  which  is  for  any  arbitrary  network  topology  and  is  independent  of  the  data 
nature  measured  at  the  network  and  which  provides  the  first  alert  of  malfunction.  We  arc  currently  working  on 
the  localization  of  the  failure  and  its  direct  impact  on  the  overall  topology  of  the  network. 

There  are  a  variety  of  manifold  learning  algorithms,  and  for  simple  illustration  we  choose  to  use  isomap 
algorithm.  This  choice  is  due  to  the  fact  that  isomap  has  provided  successful  embedding  results  and  has  its 
theory  has  been  widely  studied  [72].  The  principle  and  motivations  of  this  work  arc,  however,  extensible  to  other 
mapping  (embedding)  algorithms. 

5.1.1  Isometric  Feature  Mapping  (Isomap) 

Isomap  is  a  nonlinear  mapping  algorithm  that  starts  from  the  assumption  that  the  high  dimensional  data  lie 
on  a  Riemannian  manifold  [73].  To  achieve  the  dimension  reduction,  isomap  defines  a  mapping  that  aims  to 
preserve  the  geodesic  distances  on  the  initial  manifold.  We  may  describe  isomap  as  merely  an  improved  version 
of  Multidimensional  Scaling  (MDS)  embedding  where  the  interpoint  distance  is  restricted  to  lie  on  the  initial 
manifold  of  the  data.  Given  a  data  point  sample  of  N  points  Xt,  i  =  1,  ■  ■  ■  ,  N,  from  a  d-dimensional  manifold 
A4,  where  A4  G  Mn  and  d  <  n,  we  describe  the  different  steps  of  isomap  embedding  algorithm  in  Table  2. 

5.1.2  Adaptive  Isomap 

Although  classical  isomap  shows  good  embedding  results,  it  remains  very  unstable  and  sensitive  to  noise  and  to 
the  choice  of  the  parameter  e  and  the  distance  function  used  prior  to  applying  MDS.  Changing  the  distance  from 
Euclidean  to  geodesic  appears,  thus,  to  be  insufficient  to  completely  respect  the  intrinsic  geometric  structure  of 
the  initial  manifold  M. . 

In  what  follows  we  propose  to  introduce  a  different  distance  matrix  D m  to  replace  I)/  in  the  algorithm 
described  in  Table  2.  Our  objective  is  to  define,  each  time,  a  distance  that  is  fully  dependent  on  the  sample  point 
{Xi\^=l.  We,  hence,  rescale  the  data  coordinates  based  on  their  distributions  on  A4  as  well  as  their  correlations. 
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Table  2:  Non-adaptive  isomap  algorithm. 
Step  1  :  Construct  a  weighted  graph  G\ 


Let  G  =  {A,  X},  where: 

.  A  is  an  (N  x  N )  adjacency  matrix; 

.X  =  [Xu---  ,XNf; 

Compute  De,  the  matrix  of  Euclidean  distances  between  each 
two  points  in  {Xi}^=1. 

Choose  e,  the  neighborhood  radius, 
for  i,j  €  {1,  •  •  •  ,  N}  do 
if  D E(i,j)  <  e  do 

■A(i,j)  =  De(lj); 

else 

j)  =  oo; 

end  if 
end  for 


Step  2:  Compute  geodesic  distances  on  G. 


Let  Dq  be  the  matrix  of  geodesic  distances  between  each  two 
points  in  {Xj}^1. 

do  Dq  =  A;  (initialization)  for  i,j  €  { 1,  •  •  •  ,  N}\  k  =  1;  do 
while  D a(i,j)  A  D a(i,k)  +DG(k,j)  do 
for  k  G  { 1 ,  •  •  •  ,  N}  do 

D  G(i,j)  =  min(DG(*,j),DG(i,fe)  +  DG(k,j)); 

end  for 
end  while 


Step  3:  Apply  MDS  on  Dc. 
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Table  3:  Description  of  the  learning  step  (  new  adaptive  distance) 
Step  0:  Compute  Dm,  the  new  distance  on  X4. 


Choose  ei,  the  neighborhood  radius  for  manifold  learning; 
and  £2,  the  neighborhood  radius  for  the  construction  of  G. 
for  i  6  { 1 ,  •  •  •  ,  N}  do 

Yj  =  Xf ;  (initialization) 

for  j  =  1,  cdots,  N  do 
while  D^(i,  j)  <  ei  do 

Yi  =  [Yi;  XT]; 

end  while 
end  for 

=  cov(Yf ),  cov(-)  being  the  covariance  matrix; 

end  for 

for  i  €  {1,  •  ■  ■  ,  .ZV}  do 

for  j  €  {1,  •  •  ■  ,  N}  do 

D =  (Xj  -  Xi)  Y7\Xj  -  Xi)T; 

end  for 
end  for 

do  e  =  62;  D e  =  Dm; 

go  to  Step  1.  (See  Table  2) 


Because  it  relies  on  a  learning  procedure,  we  refer  to  this  modified  isomap  as  adaptive  isomap  algorithm.  We 
start,  therefore,  the  algorithm  of  Table  2  with  a  learning  step,  i.e.  Step  0,  as  described  in  Table  3. 

5.1.3  Performance  comparison 

In  this  section,  we  qualitatively  and  quantitatively  compare  the  performances  the  the  versions  (adaptive  and  non- 
adaptive)  of  isomap  embeddings.  To  that  end,  we  staid  by  defining  a  performance  measure.  We,  then  simulate 
different  classical  examples  of  manifolds  to  embed  in  a  lower  dimensional  space.  The  choice  of  our  examples  is 
such  that  one  may  visually  inspect  and  verify  the  properties  and  the  intuitions  behind  each  technique. 

We  choose  residual  variance  p  to  be  our  performance  indicator.  In  further  applications,  we  will  show  we  may 
use  it  to  investigate  the  topological  structure  of  a  manifold.  Residual  variance  p  is  defined  in  (58). 

p  =  1  —  (corrcoef(DA ,  Dz))2,  (58) 

where: 

•  Da  :  is  the  distance  matrix  for  the  initial  data  in  Xi.  For  isomap  embedding  technique,  this  matrix  is 
the  geodesic  distance  matrix  D(j  that  is  fed  into  MDS.  We  note  that  D^  changes  depending  on  the  first 
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distance  matrix  used  to  identify  the  neighborhood.  For  the  classical  non-adaptive  isomap,  this  distance  was 
simply  Euclidean,  L e. ,  I)/?,  while  for  the  adaptive  case  we  defined  it  as  Dm-  This  difference  is  crucial  in 
comparing  the  performances  of  the  two  versions  of  isomap  algorithm. 

•  Dz:  is  the  distance  matrix  for  the  final  (embedded)  data  of  reduced  dimension  p  (p  <  n).  We  take 
advantage  of  the  simplicity  of  the  geometries  of  our  examples  (swiss  rolls,  hemispheres,  parallel  sheets), 
and  take  p  equal  to  2  and  consider  Dz  to  be  Euclidean.  It  becomes  trivial  to  visually  verify  the  accuracy 
of  our  assumption.  For  more  complex  geometrical  and  topological  structures,  one  would,  however,  need  to 
compute  geodesic  distances  on  the  new  manifold  embedded  in  MP 

•  corrcoeff(-,  •):  is  the  linear  correlation  coefficient.  If  we  note  {dx}  and  {dz}  as  two  ordered  sets  of  the 
distances  (matrix  elements)  of  DA  and  Dz,  respectively,  then: 

corrcoef(DA  .  Dz)  =  — (59) 
&xcrz 

a xz  being  the  correlation  between  the  two  sets  {dx}  and  {dz},  and  a x  and  oz  being  the  standard 
deviations  of  {dx}  and  {dz},  respectively. 

As  stated  earlier,  in  the  presence  of  noise,  non-adaptive  isomap  sees  its  performances  drastically  dropping. 
We,  hence,  test  how  well/bad  adaptive  isomap  performs  on  the  same  noisy  datasets.  We  use  the  data  point  samples 
(omitted  due  to  file  size).  We  add  Gaussian  noise  to  them  with  standard  deviations  varying  between  0%  and  8% 
of  the  orthogonal  distance  between  the  two  parallel  sheets,  the  normal  distance  between  two  consecutive  levels 
of  the  swiss  roll,  and  the  orthogonal  distance  between  the  poles  of  the  two  adjacent  hemispheres. 

In  summary,  a  manifold  learning  approach  is  explored  to  identify  the  locations  of  attacks  or  failures  by 
detecting  variations  in  massive  data  space  under  the  assumption  that  many  sensors  (e.g.,  road-side  monitoring 
sensors)  arc  used  to  collect  information  for  normal  operation  of  a  network.  Regardless  of  the  contents  being 
collected  by  sensors,  it  is  important  to  localize  the  incidents  by  examining  data  set.  As  an  initial  effort,  we  have 
demonstrated  that  manifold  learning  is  an  effective  approach  to  reduce  the  dimension  of  high-dimensional  data 
set  and  to  detect  points  of  failure  with  abnormal  data. 

5.1.4  Coverage  hole  detection  and  localization 

We  consider  a  problem  where  a  set  of  sensors,  with  capability  to  communicate  with  neighboring  sensors  and 
perform  simple  arithmetic  operations,  arc  deployed  in  a  region  for  surveillance  and  monitoring  purposes.  We  say 
that  there  is  a  hole  in  the  coverage  if  any  paid  of  the  intended  region  of  surveillance  is  not  in  the  range  of  any 
sensor.  This  might  be  due  to  failure  of  deployment,  or  intentional  sabotage.  One  such  scenario  is  shown  in  Figure 
37  Our  goal  is  to  detect  and  locate  these  holes  in  the  coverage  in  a  robust,  resource  efficient,  fast,  and  distributive 
manner.  The  problem  is  stated  more  precisely  as  follows. 

Problem  Formulation  Consider  N  sensor  nodes  randomly  deployed  in  a  region  of  interest  'R,  in  a  plane.  We 

denote  the  collection  of  all  the  nodes  as  the  set  V  =  { v;?; } .  Each  node  vr  can  communicate  with  a  set  of  neigh¬ 
boring  nodes  A/j,  in  its  vicinity.  A  communication  graph  G  =  (V,  E)  is  thus  formed  as  the  collection  of  the  set 
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(a) 


(b) 


(c) 


Figure  37 :  Figure  shows  the  problem  context  for  localizing  coverage  holes.  The  union  of  balls  around  each  sensor 
(a)  is  called  the  coverage  ares  (b).  Floles  in  the  coverage  area  implies  that  that  region  remains  unmonitored,  and 
hence  indicates  a  failure  in  coverage.  The  only  information  available  is  the  proximity  neighborhood  for  each 
node,  as  summarized  by  the  communication  graph  (c). 

V  together  with  the  set  of  edges  E  =  {(vi,Vj)}  where  (vt,  Vj)  E  E,  if  and  only  if  vt,  Vj  can  communicate  with 
each  other.  A  graph  is  said  to  be  a  unit  disk  graph,  denoted  by  G\  =  ( V ,  E\ ),  when  ( Vi,Vj )  E  E\  if  and  only 
if  d(vi,Vj )  <  1.  A  graph  is  said  to  be  a  e— quasi  unit  disk  graph,  denoted  by  G\  =  [V.  E\),  if  1)  ( Vi,Vj )  E  E\ 
whenever  d(vi,Vj )  <  1  —  e,  2)  (vl.  Vj)  E  E\  with  probability  0.5  whenever  1  —  e  <  d(vi,Vj)  <  1,  and  3) 
( Vi,Vj )  0  E\  whenever  d(vi,  Vj)  >  1.  We  model  the  graph  as  a  quasi-unit  disk  graph  due  to  imperfections  in  the 
antenna’s  directional  sensitivity. 

Let  Rlc  denote  the  coverage  area  of  the  sensor  on  node  Vi,  and  the  union  Rc  =  IJ,  R'r  is  the  coverage  area  of 
the  network.  The  existence  of  coverage  holes  is  determined  by  whether  or  not  the  following  set  relation  holds: 


RC  Rc  =  U iR}c 


(a)  (b)  (c)  (d) 

Figure  38:  Figure  shows  the  results  of  the  coverage  hole  localization  algorithm.  The  input  to  the  algorithm  is 
Rips  complex  (constructed  locally  from  the  communication  graph)  and  the  result  of  the  localization  are  cycles 
around  holes  as  shown  in  (b),  (c),  and  (d). 

To  better  reflect  the  challenges  faced  under  a  WMD  attack  scenario  and  to  be  faithful  to  the  technical  diffi¬ 
culties  of  deploying  large  number  of  sensors,  we  further  impose  the  following  restrictions: 

1.  There  is  no  central  location  which  can  gather  all  the  data  from  the  network  for  processing. 

2.  Each  sensor  only  knows  the  identities  of  senors  in  its  geographic  proximity,  and  nothing  else. 
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3.  The  senors  arc  not  aware  their  global  location,  and  there  is  no  means  to  obtain  this  information. 

4.  The  reception  strength  of  the  antennas  is  not  uniform  in  all  directions  owing  to  imperfections  in  manufac¬ 
turing,  and  more  importantly,  due  to  presence  of  obstacles  and  fading. 

We  arc  happy  to  report  that  even  under  these  restrictions,  we  have  developed  a  provably  correct  algorithm  [74] 
which  outperforms  the  state  of  the  art  algorithms  [75]  for  similar  (often  with  less  stringent  restrictions)  problems. 

Results.  The  output  of  our  algorithm  for  an  example  sensor  network  is  shown  in  Figure  38.  The  left  most  figure 

shows  the  input  network  with  coverage  holes,  and  the  remaining  figures  shows  the  locations  of  the  coverage 
holes  as  determined  by  our  algorithm.  The  preliminary  versions  of  these  results  were  published  in  [76],  and 
the  complete  work  in  [74],  where  we  analyze  the  complexity  in  detail.  To  give  a  sense  of  the  comparison,  the 
complexity  of  our  algorithms  is  same  as  that  of  computing  a  starting  point  for  the  state  of  art  algorithm  for  solving 
the  same  problem  of  localization  under  the  restriction  assumed  here. 

5.2  Failure  Identification  and  Localization 

The  infrastructure  of  computing  systems  is  rapidly  transitioning  from  centralized  systems  to  distributed  and  per¬ 
vasive  systems.  A  very  important  category  of  such  systems  arc  sensor  networks  which  find  applications  in  areas 
including  Environmental  monitoring.  Health  care  and  Military  operations  [70].  There  has  been  a  considerable  re¬ 
search  interest  in  this  field  over  the  past  decade  addressing  problems  including  node  localization  [77],  distributed 
compression  [78],  probabilistic  inference  and  motion  tracking.  A  unifying  theme  of  many  of  these  problems  is  to 
glean  consensus  information  by  systematically  combining  the  data  collected  at  individual  nodes  in  accordance  to 
the  structure  of  the  network.  The  consensus  information  thus  obtained  characterizes  the  network,  or  the  data  in 
the  network,  as  a  whole  and  better  represents  the  underlying  phenomenon  which  can  be  inferred  from  the  data  at 
individual  nodes.  This  reveals  the  fundamental  nature  of  sensor  networks:  they  arc  essentially  complex  networks 
in  which  global  patterns  emerge  from  simple  interactions  between  nodes.  From  an  engineering  perspective,  the 
fundamental  challenge  in  sensor  network  applications  is  to  cope  with  the  limited  resources;  a  limited  communi¬ 
cation  capability  of  nodes,  i.e.  nodes  can  communicate  only  with  their  neighbors,  a  limited  power  and  a  limited 
memory.  Furthermore,  sensor  networks  arc  often  deployed  in  inaccessible  locations  and  situations  where  main¬ 
tenance  is  impractical;  this  makes  careful  use  of  exhaustible  resources  such  as  power,  imperative. 

This  unique  combination  motivates  the  use  of  techniques  such  as  topological  analysis,  which  directly  extracts 
global  information  without  being  overly  dependent  on  the  local  structure  and  thereby  alleviating  the  need  for 
excessive  resources. 

5.2.1  Objectives  and  Approaches 

There  arc  two  main  categories  of  techniques  in  literature  relating  to  the  analysis  of  the  topology:  Morse  Theoretic 
and  Algebraic  Topological  techniques.  In  our  endeavor  to  analyze  the  topology  of  spaces  of  interest,  we  predom¬ 
inantly  followed  the  Algebraic  Topological  methodology.  Algebraic  topology,  in  contrast  to  a  Morse  theoretic 
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approach,  is  relatively  a  more  direct  technique  to  analyzing  the  topology  of  a  space  easily  expressed  in  terms 
of  algebraic  objects.  There  is  an  extensive  literature  in  Algebraic  topology  [79,  80]  which  shows  a  very  strong 
relationship  between  topological  spaces  and  their  algebraic  counterparts.  Also,  this  enables  us  to  draw  from  the 
extensive  pool  of  knowledge  from  algebra  to  develop  fast  and  efficient  algorithms.  The  assigned  algebraic  objects 
have  the  following  important  properties; 

•  They  directly  reflect  the  topological  features  of  an  underlying  space. 

•  They  are  invariant  to  continuous  deformations. 


Figure  39:  A  high  level  schematic  of  Algebraic  Topology. 

The  use  of  algebraic  topology  for  the  coverage  problem  was  introduced  in  [81,  82,  83]  demonstrates  dis¬ 
tributed  computation  of  homology  groups  and  [84]  attempts  to  localize  the  holes  by  posing  the  localization  as  an 
optimization  problem.  We  further  exploit  the  spatially  constrained  nature  of  the  coverage  holes  and  formulate  a 
very  effective  “divide  and  conquer”  algorithm. 

5.2.2  Localization  of  Failures 

Main  Results:  We  demonstrate  the  merits  of  such  analysis  by  exploiting  tools  to  solve  two  specific  important 
problems: 

•  A  Coverage  Hole  detection  and  localization 

•  A  Worm-Hole  Attack  detection  and  localization 

The  problem  of  coverage  hole  localization  seeks  to  identify  a  closed  contour  in  the  network  which  encloses 
a  specific  region  of  interest.  This  region  can  be  the  part  of  the  sensor  network  which  has  been  destroyed  by  a 
WMD  attack  or  a  region  indicating  some  other  undesirable  parameters,  for  example,  radiation  level  above  a  given 
threshold.  We  further  demonstrate  the  effectiveness  of  such  topological  analysis  by  looking  at  more  complex  at¬ 
tack  in  the  network  which,  fundamentally,  alters  the  perceived  topology  of  the  network  in  order  to  significantly 
disrupt  its  usage. 

Given  a  set  of  sensors  monitoring  a  given  region  of  interest,  with  each  sensor  having  a  certain  coverage  area, 
we  want  to  confirm  whether  each  point  in  this  region  is  covered  by  at  least  one  sensor.  Further,  if  this  is  not  the 
case,  we  want  to  find  the  set  of  sensors  surrounding  the  region  which  is  not  covered  (coverage  hole). 
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(a)  Communication  Graph  G.  Di¬ 
ameter  nodes  as  circles  and  Bound¬ 
ary  nodes  as  squares 


(b)  Rips  Complex  extracted  from 
the  communication  graph 


(c)  Active  nodes  after  first  segmen¬ 
tation.  Segments  differentiated  by 
colors. 


(d)  Active  nodes  after  two  itera-  (e)  Active  nodes  after  three  itera-  (f)  Active  nodes  after  four  itera¬ 
tions  tions  tions 

Figure  40:  Simulations  showing  convergence  to  hole  locations 
A  computer  simulation  of  the  proposed  algorithm  is  shown  in  Figure  40. 

Problem  Statement:  We  wish  to  first  detect  if  there  is  a  worm  hole  attack  taking  place  in  the  network.  Fur¬ 
ther,  if  an  attack  is  detected,  we  wish  to  locate  precisely  the  sensor  nodes  surrounding  the  attack  nodes  so  as  to 
quarantine  them  and  nullify  the  attack. 


5.2.3  Worm  hole  detection  and  localization 

A  worm  hole  attack  is  typically  launched  by  two  colluding  external  attackers  who  do  not  authenticate  themselves 
as  legitimate  nodes  to  the  network.  When  initiating  a  wormhole  attack,  an  attacker  overhears  packets  in  one  part 
of  the  network,  tunnels  them  through  the  wormhole  link  (external  to  the  network)  to  another  part  of  the  network. 
This  effectively  generates  a  false  scenario  of  the  presence  of  the  original  sender  in  the  neighborhood  of  the  remote 
location.  Many  routing  algorithms  depend  on  the  nodes’  ability  to  accurately  discover  then-  neighboring  nodes. 
The  nodes  ordinarily  perform  a  broadcasting  beacon  (including  ID,  and  other  information)  to  their  neighbors.  If 
the  neighbor  discovery  beacons  are  tunneled  through  wormholes,  the  good  nodes  will  get  false  information  about 
their  route.  Although  finding  faulty  routes  is  in  itself  a  problem,  worm  holes  can  cause  further  critical  security 
threats  using  these  faulty  routes.  The  resulting  effect  of  wormholes  on  the  routing,  is  to  include  a  worm  hole  link 
in  most  of  the  computed  routes.  This  in  turn,  gives  an  attacker  complete  control  of  transmitting  great  amounts  of 
data,  which  may  be  selectively  or  completely  dropped. 

Figure  41  demonstrates  this  change  in  topology  for  a  grid  network.  Many  important  functions  such  as  routing 
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(a)  network  grid  with  links  caused  because  of  a  worm-  (b)  the  same  grid  shown  in  3d  to  respect  distant  proper- 
hole  ties  measured  as  hop  distances 

Figure  41:  Deformation  in  network  structure  because  of  a  worm  hole.  The  cycle  created  is  shown  in  red 
and  localization  [85,  86]  can  be  severely  affected  by  these  attacks. 


(a)  vi,V2  not  in  vicinity  of  X  and  Y .  A  (bj  ui,U2  in  vicinity  of  A'  and  Y.  Altema- 

shortest  path  can  be  found  in  the  network  tive  shortest  path  includes  all  the  nodes  in 

surrounding  the  nodes  removed.  the  cycle 

Figure  42:  Worm  hole  localization  algorithm.  When  an  edge  is  removed  from  the  shortest  cycle,  and  an  alternate 
path  is  sought  between  the  incident  nodes,  this  path  consists  of  all  other  edges  in  the  shortest  cycle  (b)  only  when 
the  edge  removed  is  in  the  vicinity  of  the  wormhole  attack  position. 

Assume  that  a  worm-hole  attack  is  launched  by  two  colluding  nodes  at  positions  p\  and  p2  inside  a  network. 
Denote  the  neighborhood  regions  around  these  points  by  N\  and  A^.  The  two  attacking  nodes  may  receive  all 
the  packets  transmitted  from  within  their  respective  neighborhoods,  and  relay  them  to  the  other.  Denote  by  V] 
and  V2  the  sets  of  vertices  (sensor  nodes)  which  lie  in  N\  and  A4  respectively.  The  result  of  a  worm-hole  attack 
will  be  to  produce  a  complete  bi-partite  graph  with  V\  and  V2  as  the  two  classes  of  vertices.  The  problem  of 
localizing  a  worm  hole  attack,  hence  reduces  to  identifying  the  sets  V\  and  V2.  Figure  42  shows  an  example  of  a 
worm  hole  attack.  In  this  case,  X  and  Y  are  the  positions  p\  and  P2  and  the  neighborhoods  A  and  B  are  N\  and 
N2  according  to  our  definition. 
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By  a  simple  observation,  we  show  that  the  algorithm  to  find  a  coverage  hole,  may  be  extended  to  address  this 
problem.  We  note  here  that  in  order  for  the  proposed  hole  localization  algorithm  to  work  in  the  presence  of  a 
wormhole  attack,  we  do  not  require  the  attacking  nodes  to  perform  any  computation,  or  follow  any  protocol.  We 
assume  that  the  attacking  nodes  “carry  out  their  tasks  for  coverage  hole  detection  ”,  i.e.,  relay  the  broadcast  signals 
from  one  position  to  another.  This  effectively  creates  a  virtual  link  in  the  network,  and  the  nodes  in  the  network 
perform  all  the  computations.  An  additional  assumption  here,  is  that  the  behavior  of  the  attacking  nodes  does 
not  change  during  the  run  time  of  our  algorithm.  Figure  42  demonstrates  the  workings  of  the  algorithm,  where  it 
is  successful  in  correctly  identifying  the  worm  hole  location.  These  results  are  widely  publicly  disseminated  in 
[87], 

For  the  sake  of  clarity,  we  note  that  the  worm  hole  localization  algorithm  does  not  share  the  mathematical 
rigor  of  its  coverage  hole  counterpart,  but  remains  a  heuristics.  In  the  dissertation  [87],  we  also  provide  an 
example  where  it  is  impossible  to  distinguish  between  coverage  holes  and  worm  holes  in  the  absence  of  geometric 
information.  Owing  to  this  lack  of  rigor,  we  analyze  the  empirical  performance  of  algorithm  in  distinguishing 
between  a  coverage  and  a  worm  hole  as  shows  in  Figure  43.  As  seen,  the  algorithm  has  a  good  detection  and 
false  alarm  rates. 


(a)  misclassification  rate  for  (b)  misclassification  rate  for 

worm  holes  coverage  holes 


Figure  43:  Experimental  results  show  that  the  algorithm  is  able  to  correctly  identify  both  the  coverage  holes  and 
worm  holes  about  85%  of  the  time. 


5.2.4  Summary  and  Benefits 

As  one  of  the  most  important  objectives  of  this  project,  we  aim  to  develop  new  models  and  approaches  to  identify 
the  impact  of  multiple  failures.  To  this  end,  we  have  achieved  the  following  milestones. 

•  Demonstrated  that  this  is  fundamentally  a  topological  problem  and  formulated  the  problem  in  a  generalized 
topological  framework.  This  involves  generalizing  the  notion  of  a  graph  into  a  simplicial  complex. 

•  We  used  tools  from  Algebraic  Topology  to  effectively  solve  the  problem. 

•  We  introduced  a  novel  distributed  “divide  and  conquer”  algorithm  which  converges  to  desired  solution 
significantly  faster  than  the  previously  proposed  methods  in  the  literature. 
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•  We  also  achieve  huge  saving  in  the  communication  energy  required  by  purring  half  the  active  nodes  to  rest 
in  each  iteration. 

•  We  solve  the  problem  in  the  same  generalized  framework  developed  to  solve  the  coverage  hole  problem 
thus  demonstrating  the  effectiveness  of  our  framework  for  a  myriad  of  topological  problems  in  sensor 
networks. 

•  We  developed  a  distributed,  efficient  and  fast  algorithm  to  detect  the  attack  and  localize  the  attack  position. 

5.3  Detection  and  Tracking  of  Random  Failures 

Due  to  ease  of  availability  and  versatility  of  applications,  sensor  networks  have  received  much  attention  in  the 
literature  in  past  decade.  Owing  to  their  low  cost  and  pervasive  characteristics,  sensor  networks  arc  ideal  to  be 
deployed  in,  and  to  monitor  hazardous  environments  such  as  volcanic  eruptions,  wild  fires,  land  slides,  war  zones 
etc.  One  feature  common  among  these  environments  is  that  they  produce  events  which  cause  correlated  failures 
in  both  space  and  time.  Being  able  to  detect  and  track  such  failures  becomes  crucial  both  for  the  sake  of  tracking 
itself  and  for  any  emergency  response  thereafter.  Further,  any  such  tracking  algorithm  should  be  simple  and  very 
fast  so  as  to  decrease  the  response  time.  In  this  paper,  we  present  a  simple  algorithm  which  meets  the  above 
needs. 

In  the  deployment,  works  in  [88,  89],  the  authors  present  a  new  fault  tolerance  metric  for  deployment  called 
the  Region  based  connectivity.  For  usual  /c-connectivity  criteria,  the  network  is  designed  so  as  to  ensure  the 
connectivity  of  the  network  even  when  k  —  1  nodes  fail,  but  the  nodes  may  be  taken  from  any  paid  of  the  network. 
Region  based  fc-connectivity  imposes  an  additional  condition  that  all  these  k  —  1  nodes  are  within  a  closed  region. 
These  type  of  failures  depict,  for  example,  the  scenario  of  WMD  attacks.  They  show  that  the  design  using  this 
criteria  requires  fewer  nodes  than  required  for  the  standard  ^-connectivity. 

In  [90],  the  author  presents  a  statistical  test  to  identify  whether  a  given  set  of  node  positions  is  likely  to  be 
produced  by  a  random  deployment  of  nodes.  If  the  given  set  of  node  positions  does  not  fall  in  the  confidence 
region,  a  systematic  failure  is  detected.  The  author  in  [90]  assumes  the  availability  of  localization  information 
so  as  compute  the  voronoi  diagram  for  the  given  points.  Further,  he  also  assumes  that  deployment  is  in  a  convex 
region.  We  do  not  make  both  the  above  assumptions. 

For  topological  analysis,  the  collection  of  work  in  this  topic  deals  with  detecting  failure  event  region  and 
analysis  of  its  topological  changes.  In  [91,  92,  93],  the  authors  present  some  in  network  heuristics  to  detect  a  hole 
formation  and  collapse,  and  other  topological  changes.  Further  work  in  [94,  95]  formulate  and  solve  the  above 
problem  in  a  concrete  mathematical  setting  using  low  complexity  distributed  algorithms. 

In  contrast  to  the  above  mentioned  articles:  l)our  work  deals  with  catastrophic  situations  where  the  nodes  inside 
the  region  of  interest  fail  completely  or  unable  to  communicate  with  any  other  node.  2)such  nodes  cannot  partic¬ 
ipate  in  response  mechanisms  which  makes  the  problem  more  complicated  as  we  can  only  use  the  information 
available  in  the  neighboring  nodes  to  perform  any  task,  3)  Our  formulation  does  not  assume  any  hardware  with 
capabilities  to  sense  the  phenomenon  causing  the  failure,  and  4)  is  robust  to  random  failures  of  nodes.  The  work 
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in  [96]  presents  a  distributed  algorithm  to  track  dynamic  boundaries  but  assumes  co-ordinate  information  and  no 
node  failures. 

Dynamic  curve  tracking  has  been  extensively  studied  in  image  processing,  a  domain  from  our  perspective, 
deals  with  centralized  processing  with  complete  co-ordinate  information.  Significant  work  in  this  area  can  be 
grouped  as  l)active  contour  analysis  with  linear/non-linear  evolution  models  using  parametric/non-parametric 
kalman  filtering  techniques,  and  2)variational  methods  [97,  98]  where  the  curve/boundary  of  interest  is  obtained 
as  a  solution  to  a  certain  functional  optimization.  Application  range  widely  including  land  slide  tracking[99], 
volcanic  material  flow  analysis,  object  tracking  [100]  etc.  These  methods  cannot  be  directly  adopted  here  as  they 
assume  co-ordinate  information  and  centralized  processing. 

5.3.1  Objectives  and  Approaches 

We  aim  to  develop  a  low  complexity  distributed  algorithm  for  detecting  and  tracking  such  failures.  We  assume 
that  nodes  inside  the  failure  region  are  either  destroyed  or  unable  to  communicate  with  any  other  node.  The 
algorithm  presented  here  does  not  assume  any  co-ordinate  information  for  the  nodes.  We  evaluate  the  algorithm 
using  simulations. 

We  consider  a  region  R  in  which  a  set  of  nodes  V  arc  deployed.  For  a  communication  radius  rc,  a  com¬ 
munication  graph  Grc  =  (V,  E)  is  induced  on  V  where  (v\,V2)  G  E  <(=>•  d\2  =  |  (^1,^2)  |  <  rc.  An  edge 
(ui,  V2)  in  Grc  implies  that  v \  is  within  the  communication  region  (and  vice-versa)  of  V2,  or  in  other  words,  can 
be  observed  by  V2 .  In  what  follows,  we  use  vt  to  denote  the  node  with  index  i  and  also  its  position  in  R. 

Definition  14.  For  a  time  evolving  (S' 2  smooth,  simple  and  closed  curve  C (s,t)  :  5  x  M  ->  R,  a  systematic 
failure  is  defined  to  be  occurring  if 

•  Vt  >  0,  Vi  G  C(s,  t)  =>•  Vj.  fails  at  time  t. 

•  Let  F(t)  denote  the  set  of  nodes  failed  at  or  before  time  t.  and  in  Int  (C  (s,  t)),  then  \F(t)  \  >  2. 

The  time  evolution  of  the  curve  C(s,  t)  can  be  specified  by  assigning  a  velocity  vector  at  each  point  of  the 
curve  in  a  normal  direction.  We  consider  the  vector  only  in  the  normal  direction  as  any  horizontal  component 
will  only  result  in  a  reparameterization  of  the  curve. 

5.3.2  Tracking  Random  Failures  using  A  Graphic  Tool 

We  use  the  representation  on  the  graph  in  which  tracking  problem  is  generally  formulated  as  estimation  of  both 
C(s,  t)  and  i/(s,  t).  The  accurate  estimation  of  both  these  quantities  is  not  possible  in  our  scenario,  because  of 
lack  of  co-ordinates.  Tracking  can  only  be  performed  from  the  reference  of  the  graph  itself,  i.e.,  the  position  of 
C(s,  t)  can  be  given  by  specifying  the  nodes  in  the  graph  which  are  “close”  to  it.  But  the  graph  doesn’t  provide 
the  means  to  accurately  determine  C(s,  t).  For  example.  Figure  ??  shows  a  snap  shot  of  a  failure  event,  and  the 
failure  region  is  indicated  by  the  ellipse.  As  can  be  seen  in  this  example,  a  good  paid  of  the  curve  does  not  have 
any  nodes  (which  have  not  yet  failed)  close  to  it.  Owing  to  these  inherent  problems,  we  formulate  it  as  following: 
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for  each  node  vt  and  a  given  time  t,  we  estimate  the  time  T,  (t)  after  which  the  node  fails.  Specifying  the  time 
remaining  at  each  node,  captures  both  the  distance  to  phenomenon  causing  the  failure  and  the  speed  at  which  it 
is  approaching,  and  also  gives  us  a  measure  of  risk  at  each  node. 

Main  Results:  We  describe  the  tracking  algorithm  which  computes  and  maintains  the  estimates  of  time  to  fail¬ 
ure  at  each  node.  As  we  assume  that  the  sensors  do  not  necessarily  have  any  hardware  equipped  to  sense  the 
phenomenon  causing  the  failure,  the  only  observations  that  can  be  made  arc  nodes  failing,  and  the  time  a  node 
fails  is  our  event  of  interest.  When  a  node  fails,  its  neighbors  make  a  decision  whether  the  failure  represents  a 
systematic  failure  or  a  random  failure.  This  decision  process  will  be  described.  If  a  systematic  failure  is  detected, 
the  neighbors  estimate  the  speed,  with  which  this  failure  is  happening  (locally)  and  propagate  this  information 
into  the  network.  Given  this  speed,  we  describe  later  in  this  paid,  a  methodology  to  be  used  to  determine  this 
speed  in  other  parts  of  the  network.  We  adopt  a  simple  prediction  rule,  i.e.,  the  speed  of  approach  at  any  node 
will  remain  constant  during  the  time  interval  between  two  failures. 

Our  simulations  show  that  this  is  a  good  approximation  and  we  leave  the  more  detailed  modeling  of  this 
evolution  as  a  part  of  our  future  work.  As  such,  we  adopt  here  a  predict-and-update  methodology  used  in  a 
Kalman  filter.  We  cannot  use  a  Kalman  filter  directly  as  the  the  evolution  statistics  arc  difficult  to  obtain  which 
is  discussed  in  more  detail  next.  In  what  follows,  we  denote  the  time  between  any  two  node  contiguous  failures 
as  At  and  the  difference  in  failure  times  of  nodes  vt  and  Vj  as  Attj.  Each  node  vt,  maintains  an  estimate  of  time 
to  failure  T(t),  and  a  speed  of  approach  Si(t).  When  the  node  vl  fails  at  time  t,  it  computes  an  observed  speed 
s°(t)  and  uses  this  observation  to  compute  its  estimate  st(t).  Note  that  here  and  in  what  follows,  we  adopt  for 
convenience  the  notation  “a  failed  node  computes”,  while  these  computations  arc  actually  being  performed  in  the 
neighboring  nodes. 

Failed  nodes  Given  the  time  difference  between  two  node  failures,  the  primary  difficulty  in  estimating  the  average 
speed  is  that  the  distance  between  these  two  nodes  is  computed  using  the  graph  distance  (path  length)  which  is 
only  approximate.  The  work  in  literature  presents  important  statistical  results  on  random  graphs  in  limiting  cases 
(n  — >  oo),  but  little  is  known  about  the  statistics  of  path  lengths  in  random  graphs  with  relatively  small  number 
of  nodes.  As  such,  we  compute  speeds  by  considering  the  time  difference  only  between  adjacent  nodes.  More 
importantly,  we  have  to  wait  for  a  node  to  fail  to  make  a  computation,  as  this  is  the  only  way  to  measure  the  time 
the  failure  phenomenon  takes  to  reach  that  node. 

When  a  node  v%  fails  at  time  t,  the  apriori  estimate  s~  (t)  is  computed  using  a  simple  prediction  model  as  given  in 
Equation  60.  In  the  event  when  there  arc  two  other  neighbors  vj,  which  failed  previously,  such  that  (vl,  v3,  v^) 
forms  a  clique  (simulations  show  that  this  is  very  likely  to  happen),  we  can  compute  the  speed  of  approach  with 
reasonable  accuracy.  Figure  5.3.2  shows  the  construction  for  this  computation.  The  red  arcs  show  the  paid  of  the 
evolving  curve  passing  through  the  nodes.  If  (vi,  Vj,  Vk)  form  a  clique,  we  can  compute  the  angle  6  as  we  know 
all  the  sides  of  the  triangle.  The  chord  VjX i  can  be  approximated  as  being  perpendicular  to  ov, ;  let  r'  =  \pX\  |, 
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then 


r  r  T  I 

=  (i-cos^))=  i- 

dr  Cli  di  \ 


a2sin(6 i) 


the  second  step  is  obtained  using  the  application  of  sine  law  on  the  triangle  ovjVi.  This  immediately  shows  that 
as  r  — >  oo,  r' /d  — ►  0  and  therefore,  for  large  r,  angle  XurX\v3  is  very  close  to  7t/2.  Approximating  both  the 
chords  VjX\  and  V}X2  as  being  perpendicular  to  the  line  ovi,  we  have  the  following  equations: 


\viXi  | 


di 


acos(0i),  \viX2\  =  d\  = 


6  i+d2  =  6, 


di 

A  tij 


b cos(02) 
d'i 


^tik 


the  last  equations  arises  from  the  fact  that  the  speed  is  assumed  to  be  constant  during  the  interval  A  tij.  Solving, 
we  have 


di 


sin  (9) 

S  -  i  cos(0) 


c 


di 


and  the  speed  X-  it)  is  obtained  as  being  equal  to  d,/ Atl;j.  As  we  have  the  complete  information  to  compute  the 
speed,  the  estimate  at  time  t  will  be  equal  to  s°  as  given  in  Equation  61.  In  case  we  do  not  have  a  triangle  of  failed 
nodes  as  assumed,  we  use  Equation  62.  The  lengths  of  the  edges  used  in  the  computation  are  always  greater  than 
the  closest  distance  between  the  node  and  the  curve.  As  a  result,  we  end  up  overestimating  the  speed.  The  min 
in  Equation  62  serves  to  compensate  this  effect,  and  further,  since  it  might  not  be  very  accurate,  we  use  a  simple 
smoothing  function  as  given  in  Equation  63  for  updating  the  estimate. 


Zv.VjV;  -  8 


Figure  44:  The  construction  used  to  compute  speed  when  a  node  fails. 


Si  (t)  =  Si(t  -  At) 


(60) 


Si(t)  =  s°(t) 


(61) 
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(62) 


s*(t)  =  ^  (Si  (t)  +  sf(t))  .  (63) 

Alive  nodes:  have  not  yet  failed,  the  speed  prediction  is  same  as  in  Equation  (60).  To  calculate  the  Tt  and  update 
Si,  each  node  maintains  a  distance  diC(t)  =  dic{t  —  At)  —  Atsi(t  —  At)  which  is  the  estimate  of  the  minimum 
distance  from  the  node  to  the  curve.  When  a  node  Vj  fails,  its  speed  estimation  is  applicable  to  node  vt  (not 
failed)  only  when  vj  is  the  closest  amongst  the  failed  nodes  to  node  v%.  Figure  70  illustrates  this  concept.  Since 
the  curve  evolution  is  given  by  the  speed  in  the  normal  direction,  the  time  which  the  ellipse  (failure  phenomenon) 
takes  to  reach  v  is  independent  of  v^,  and  v\  is  determined  by  the  looking  at  the  point  on  the  curve  closest  to  v.  A 
similar  idea  can  be  extended  onto  the  graph.  Figure  (a)  shows  all  the  nodes  alive  in  red,  and  failed  node  (shown 
as  blue  circle).  When  the  failed  node  satisfies  this  condition,  we  use  Equation  64  to  update  the  estimated  speed 
and  fi(t)  =  dic/si(t). 


s*(t)  =  \  (»i  (t)  +  s°j{t))  .  (64) 

Random  Failures:  In  order  to  make  our  algorithm  robust  to  random  failures  due  to  causes  other  than  systematic 
time  evolving  failure,  we  first  detect  the  occurrence  of  a  systematic  failure  before  estimating  times  to  failure  at 
the  nodes.  For  this,  the  following  simple  rules  should  suffice:  l)if  all  the  neighbors  of  a  failed  node  are  alive,  no 
speed  is  estimated  and  no  action  is  taken,  2)if  for  a  failed  node,  there  is  a  neighbor  which  already  failed,  then  a 
tentative  speed  is  computed  and  3)  if  for  a  failed  node,  there  is  a  neighbor  which  previously  failed  and  has  a  speed 
computed,  the  new  failed  node  computes  the  speeds  and  compares  it  with  the  tentative  speed  of  its  neighbor.  If 
the  speeds  arc  close  enough,  a  systematic  failure  is  detected  and  the  information  is  broadcast  to  all  alive  nodes  to 
compute  the  time  to  failure.  If  there  arc  more  than  one  failed  neighbors,  then  a  systematic  failure  is  detected  if  at 
least  one  neighbors’  speed  is  compared  and  confirmed. 

The  only  step  which  needs  some  explanation  for  its  implementation  is  the  broadcasting  of  the  information 
that  a  systematic  failure  is  detected.  When  a  node  fails  and  a  speed  is  confirmed,  its  broadcasts  this  speed  to  all 
its  neighbors.  As  the  data  packets  arc  passed  along,  they  aggregate  the  edge  lengths  along  their  way.  Any  node 
broadcasts  the  information  about  a  node  failure  only  once.  Further,  if  the  distance  from  the  failed  node  v3  to  the 
node  Vi  is  greater  than  die,  then  v,  does  not  propagate  this  information.  This  way,  the  information  about  the  node 
Vj  being  failed  will  reach  only  those  nodes  for  which  Vj  is  closest  amongst  all  the  other  failed  nodes.  Based  our 
our  description  of  active  nodes,  this  is  sufficient  and  a  significant  cost  in  terms  of  power  is  saved. 

Simulation  and  Evaluation:  We  simulated  a  time  evolving  failure  on  a  network  with  500  nodes.  The  time 
evolving  curve  was  taken  to  be  an  ellipse  with  speed  at  the  tip  of  the  major  axis  being  twice  as  that  at  the  tip  of 
minor  axis  and  continuously  changing  in  between.  Figures  45(a)  and  45(b)  pictorially  depict  the  time  estimates  at 
the  nodes  for  two  different  times.  The  red  color  implies  that  the  nodes  estimate  relatively  smaller  time  to  failure 
and  the  blue  implies  longer  times.  Figures  45(c)  and  45(d)  show  the  error  between  the  speed  and  time  estimates 
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(a)  snapshot  depicting  time  estimates  at  time  f  i 


„  io'3  Efficacy  of  speed  estimation 


(c)  convergence  of  speed  to  the  true  value 


(b)  snapshot  depicting  time  estimates  at  time 

t2  >  tl 


efficacy  of  tracking  time  remaining  to  failure 


(d)  error  in  Time  estimates  vs  time 


Figure  45:  Simulation  and  evaluation  results. 


and  the  true  values  as  a  function  of  time.  The  results  in  Figures  45(c)  and  45(d)  are  for  a  isotropic  case  where 
the  curve  evolving  was  a  circle  and  speed  is  constant  in  all  directions.  We  chose  this  scenario  to  illustrate  the 
convergence  of  speed  estimates.  The  error  in  speed  values  converges  to  zero  as  time  progresses.  The  error  in 
time  estimates  converges  to  a  positive  value.  This  is  because  of  the  error  in  approximating  the  true  straight  line 
Euclidean  distances  between  points  with  the  path  distance  on  the  graph. 


5.3.3  Summary  and  Discussions: 

As  one  of  the  most  important  objectives  of  this  project,  we  aim  to  develop  new  models  and  approaches  to  detect 
and  track  multiple  failures.  To  this  end,  we  have  presented  a  simple,  low  complexity  and  distributed  algorithm  to 
detect  and  track  a  systematic  and  time-evolving  failure.  The  accuracy  of  the  results  despite  a  very  simple  model 
shows  much  promise  in  this  strategy.  In  our  future  research,  we  will  investigate  more  sophisticated  models  for 
tracking  the  curve  evolution  to  further  improve  the  accuracy  and  convergence  rates.  A  significant  hurdle  to  cross 
would  be  obtaining  good  statistics  for  the  error  between  actual  distance  between  two  nodes  and  the  path  length 
on  a  geometric  graph. 

In  our  DTRA  funded  work  to-date,  we  have  developed  very  efficient,  very  rapid  and  novel  algorithms  to 
detect  and  localize  static  failures  viewed  as  coverage  holes,  and  where  the  connectivity  was  primarily  defined 
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in  terms  of  range  proximity.  These  algorithms  also  enjoyed  a  very  important  characteristic  of  being  distributed 
and  coordinate-free.  The  latter  feature  is  particularly  important  in  a  WMD  environment,  where  it  is  more  likely 
that  nodes  would  not  be  benefiting  of  Global  Positioning  System  (GPS)  help  due  to  its  unavailability.  A  nat¬ 
ural  follow-up  question  is  then  of  characterizing  the  correlated  failures  caused  by  one  or  more  WMD  events, 
and  distinguishing  them  from  failures  which  arc  random  and  unrelated  by  using  failure  detection  and  tracking 
algorithms. 

5.4  Identification  and  Tracking  of  Systematic  Failures 

Our  goal  is  to  detect  and  track  “systematic”  failures,  which  amount  to  being  spatially  and  temporally  correlated. 

In  a  region  where  a  sensor  network  is  deployed,  catastrophic  events  such  as  wild  tires,  volcanic  eruptions  or 
massive  explosions,  cause  large  number  of  node  failures.  The  nature  of  these  events  is  such  that,  the  positions  of 
the  failed  nodes  arc  confined  to  a  geographic  region.  Also,  all  the  nodes  within  this  region  fail.  The  area  of  such 
region  increases  with  time,  and  the  evolution  of  its  boundary  is  continuous  in  time. 

5.4.1  Network  Classification 

The  application  of  network  synthesis  and  classification  requires  a  proper  model  capturing  the  properties  that 
control  the  network  evolving  as  well  as  the  features  that  characterize  the  network.  To  this  end,  we  propose  a 
generalized  Markov  Graph  model,  whose  dependence  structure  of  the  network  is  used  to  control  the  network 
evolving,  and  probability  distribution  of  the  networks  provides  features  characterizing  the  networks. 

To  date,  research  in  network  analysis  has  evolved  in  many  dimensions.  For  network  synthesis,  Barabasi- 
Albert  model[101]  successfully  generates  networks  that  follow  a  power  law  degree  distribution  based  on  the 
assumption  of  preferential  attachment.  In  statistics,  the  Markov  Graph  model [102],  the  p*  model [103]  as  well 
as  the  models  from  Statistical  Mechanics[104]  are  well  known  for  characterizing  the  probability  distribution  of 
networks.  More  recently,  we  proposed  a  method  based  on  a  Markov  Graph  model  to  address  the  problem  of 
network  classification[105]. 

We  were  the  first  to  propose  a  method  based  on  Markov  Graph  model  to  classify  different  types  of  networks[105]. 
It  makes  use  of  the  two  features,  degree  distribution  and  the  number  of  triads,  which  determine  the  probability 
distribution  of  networks  in  Markov  Graph  model  as  the  crucial  features  for  classification  of  social  networks.  In 
the  research  area  of  social  network  synthesis,  Barabasi-Albert  model  as  well  as  many  other  related  models  en¬ 
able  the  synthesis  of  the  evolution  of  simple  networks  whose  degree  distributions  obey  the  power  law[101].  The 
algorithms  for  these  models  arc  clear  and  efficient.  The  open  question,  however,  and  in  light  of  these  models’ 
shortfall  of  fully  capturing  the  probabilistic  structure  of  a  network,  is  about  the  statistical  behavior  of  the  required 
additional  features,  namely  the  clustering  and  crowding  coefficients3.  To  this  end,  we  propose  a  model  which 
provides  answers  to  this  question  and  on  both  counts. 

Dependence  structure  of  the  network.  In  a  generalized  Markov  Graph  model,  pairs  of  nodes  and  triplets  of  nodes 
are  both  considered  as  basic  units  in  the  dependence  graph  D.  We  use  the  term  ”doublet-node-vertex”,  denoted 

3Crowding  coefficient  is  a  new  feature  for  networks. 
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by  {i,  j},  to  represent  the  basic  unit  in  D  corresponding  to  a  pair  of  nodes  i  and  j  in  G  and  ”triplet-node-vertex”, 
denoted  by  {a,  b,  c},  for  that  corresponding  to  a  triplet  of  nodes  a,  b  and  c  in  G.  And  Mia  by  is  the  value  of  the 
element  {a,  b}  in  the  adjacency  matrix  {M},  Tia  h  c\  represents  the  state  of  a  triplet-node-vertex,  and  Tra  &  ci  =  1 
when  M{om  -  M{b,c}  and  M{c,a}  all  lakc  value  1,  and  Tta  h  cj  =  0  otherwise. 

The  dependence  graph  for  networks  under  our  proposed  generalized  Markov  Graph  model  is  defined  as: 

Dgm  =  {nodeoGM ,  edgeoGM} ■ 

The  node  set  of  Dgm  is  defined  as  nodeoGM  =  A2  U  A3,  where  No  =  |J*  j  {*>  j}}  is  ilie  set  of  all  doublet- 
node-vertices,  and  A3  =  (J  •  .  k  k}}  is  the  set  of  all  triplet-node-vertices.  The  set  edgeoGM  includes 

all  edges  between  two  nodes  in  Dq m  whose  states  conditionally  depend  on  each  other  and  such  a  dependence 
structure  is  shown  as  follows: 

1.  if  {i,j}  fl  {a,  b\  /  0  and  {i,  j}  /  {a,  b},  the  probability  mass  function  P(Muj\)  depends  on  variable 

M{a,by, 

2.  if  {i,j}  n  {a,b,c}  /  0  and  {i,j}  {a,b,c},  P(M{iJ})  depends  on  vaiiable  T{aM}  and  P{T{aAc} ) 

depends  on  vaiiable  Mujy; 

3.  if  {*,  j,  k}  fl  {a,  6,  c}  /  0  and  {i,  j,  k}  ^  {a,  b,  c},  P(Tu  depends  on  variable  TVa  6  ci. 

Crucial  features  for  networks.  Any  pseudo-homogeneous  simple  network  graph  G  with  an  associated  dependence 
graph  Dgm ,  has  a  probability  mass  function: 


Pgm(G)  =z~1exp(J2  dk(G)@k  +  ^  bc(G)Tc 

k  c 

+  y 

m 


(65) 


where  dk[G)  is  the  number  of  nodes  which  share  the  same  degree  k  in  network  G,  0/.  is  the  associated  coefficient, 
bc(G)  is  the  number  of  nodes  which  share  the  same  clustering  coefficient  c,  Tc  is  the  associated  coefficient, 
t^j(G)  represents  the  number  of  triads  which  share  the  same  crowding  coefficient  m,  and  is  the  associated 
coefficient. 

The  sequence  {dk(G)},  {bc(G) },  {f^(G) }  correspond  to  the  histograms  of  the  degree  list,  the  clustering 
coefficient  list  and  the  crowding  coefficient  list,  respectively,  and  their  information  could  be  expressed  by  the 
degree  distribution,  the  clustering  coefficient  distribution  and  the  crowding  coefficient  distribution  of  network  G. 

Network  synthesis.  We  evaluate  the  synthesis  power  of  the  generalized  Markov  Graph  model  by  designing  a 
social  network  evolving  algorithm  based  on  its  properties  introduced  in  previous  section.  According  to  its  first 
two  properties,  the  probability  function  for  the  state  of  a  doublet-node-vertex  {i.  j}  in  a  network  should  only 
depend  on  the  state  of  all  the  doublet-node-vertices  that  include  node  i  or  node  j  as  well  as  triplet-node-vertices 
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that  include  either  one: 


-Pgm(A%j})  =  /(-&%!},  M{i)2}, Af{i j},  Af{2j}..., 

^{i,l,2}>  ^{i,l,3}i  •••!  ^{1,2J}5  7{1,3j'})  ■■•)) 

where  /(...)  is  a  function  that  depends  on  the  variables  Mun,  Mu  2\,  ...,  Mm  ji  ,  Mr2ji, TA  12i ,  , 

Tr12j|,  T{i3j|,  ...  .  As  an  application  example  of  the  generalized  Markov  Graph  to  network  synthesis,  we 
change  the  preferential  attachment  algorithm  in  Barabasi- Albert  model,  which  is  a  special  case  of  a  Markov 
Graph  model,  to  a  new  algorithm  based  on  a  generalized  Markov  Graph  model,  by  specifying  the  probability 
function  of  the  state  of  a  doublet-node-vertex  {i,  j}  in  network  G  to  be: 


PG(Mi,jV  >  j)  =  kj 

where  represents  the  degree  of  node  j,  Tlk^n,k^i,n^iM{j^\M{jn\Msnk\/2  is  the  number  of  triads 

that  include  node  j,  C  is  a  normalization  factor,  and  i  >  j  represents  i  is  a  newer  node  than  j.  It  indicates 
that  the  probability  is  proportional  to  degree  of  node  j,  and  to  the  number  of  triads  that  include  it.  Note  that 

P'c(Mh]\i  >  j )  is  a  special  case  of  PGM(M{i,j})- 

Based  on  P'G(Mij\i  >  j )  resulting  from  a  generalized  Markov  Graph  model,  we  specify  the  rules  of  a 
network  evolution  and  verify  that  the  resulting  network  satisfies  the  features  described  earlier.  The  rules  are  as 
follows: 

1 .  There  arc  mo  nodes  in  the  network  at  time  t  =  0. 

2.  At  each  time  step,  a  new  node  with  mo  attached  edges  is  introduced  to  the  network. 

3.  Each  of  these  edges  will  find  another  node  as  its  vertex  in  the  network.  The  probability  of  a  node  being 
picked  is  P'G(Mi:j\i  >  j),  where  i  represents  the  newly  added  node,  and  j  for  the  node  being  picked. 

5.4.2  Systematic  Failure  Analysis 

Sensor  networks  arc  often  deployed  to  monitor  hazardous  environments.  Several  events/phenomenon  in  such 
environments  may  cause  spatially  and  temporally  correlated  failures  in  the  network.  We  present  here,  a  low 
complexity  distributed  algorithm  for  detecting  and  tracking  such  failures.  We  assume  that  nodes  inside  the  failure 
region  arc  either  destroyed  or  unable  to  communicate  with  any  other  node.  The  algorithm  presented  here  does 
not  assume  global  co-ordinate  information  for  the  nodes,  nor  any  capabilities  to  sense  the  phenomenon  causing 
the  failure,  and  that  there  arc  some  nodes  failing  randomly.  We  utilize  the  well  studied  Restricted  Delaunay 
Triangulation  (RDT)  and  show  that  we  can  distinguish  between  systematic  and  random  failures,  thus  enabling 
their  robust  detection.  By  formulating  simple,  local  optimization  problems  which  have  closed  form  solutions, 
the  evolution  of  the  failure  front  is  tracked  accurately.  The  methodology  presented  is  evaluated  using  several 
substantiating  simulations. 
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Topological  Analysis  In  contrast  to  deployment  related  failure  problems,  the  research  this  topic  focuses  on  de¬ 
tection  of  dynamic  failure  regions,  and  analysis  of  their  topological  changes.  While  the  related  work  presented 
here  is  in  no  way  comprehensive,  we  intend  to  provide  an  insight  into  the  distributed  algorithms  for  detecting 
topological  changes. 

The  work  in  [91,  93],  seeks  to  detect  topological  changes  on  a  scalar  field  defined  on  the  nodes  in  the  network. 
The  topological  changes  arc  described  as  creation/loss  of  holes  and  formation/merging  of  connected  components. 
In  [91],  such  events  arc  detected  by  observing  the  1-hop  neighborhoods  of  nodes,  and  by  detecting  the  topolog¬ 
ical  changes  therein,  with  the  assumption  that  the  communication  graph  accurately  captures  the  topology  of  the 
field  of  interest.  [93]  constructs  a  tree  structure  representing  the  adjacency  of  topological  components,  and  by 
analyzing  the  change  in  this  structure  when  the  topology  changes,  arc  able  to  detect  it. 

In  [92,  96],  the  authors  seek  to  track  an  evolving  topological  component,  for  example,  a  subset  of  the  region 
where  a  defined  field  is  above  a  threshold.  The  topological  component  of  interest  is  assumed  to  be  of  polygonal 
shape,  which  is  identified  by  computing  the  convex  hull  of  the  component.  Such  computation  is  facilitated  by 
maintaining  a  hierarchical  structure  in  the  network.  The  cluster  heads  gather  the  required  information  from  nodes 
in  the  cluster  to  compute  the  convex  hull.  [96]  extends  such  analysis  to  cases  where  the  sensor  measurements  arc 
noisy  and  clear  transition  is  not  well  defined.  A  regression-based  spatial  estimation  technique  determines  discrete 
points  on  the  boundary  and  estimates  a  confidence  band  around  the  entire  boundary,  and  a  Kalman  Filter-based 
temporal  estimation  technique  tracks  changes  in  the  boundary  and  aperiodically  updates  the  spatial  estimate. 
They  assume  global  coordinate  information  in  order  to  perform  the  regression  analysis. 

In  contrast  to  the  above  mentioned  articles,  our  work  deals  with  catastrophic  situations  with  the  following  char¬ 
acteristics: 

1 .  the  nodes  inside  the  region  of  interest  fail  completely,  or  arc  unable  to  communicate  with  any  other  node, 

2.  such  nodes  cannot  participate  in  response  mechanisms,  restricting  the  tasks  performed  to  the  information 
available  in  the  neighboring  (still  active)  nodes, 

3.  lack  of  hardware  with  capabilities  to  sense  the  phenomenon  causing  the  failure, 

4.  and,  there  arc  “randomly”  occurring  failures  of  nodes. 

Network  model  We  consider  a  region  R  C  l2,  in  which  a  set  of  nodes  V  are  deployed.  For  a  communication 
radius  rc,  a  communication  graph  G  =  ( V ,  E)  is  induced  on  V  where  F  E  d\2  =  |  (ui,  v^)  \  <  rc. 

An  edge  in  G  implies  that  v\  is  within  the  communication  region  (and  vice-versa)  of  vi,  or  in  other 

words,  can  be  observed  by  V2 .  For  a  node  v%  e  G,  (or  equivalently,  vr  6  V),  we  denote  by  Nt,  the  neighbors  of 
Vi  in  G.  Given  this  model  for  the  communication  graph,  the  notion  of  the  deployment  region  R  may  be  made 
precise  as  follows. 

Definition  15.  The  projection  C'  of  a  subgraph  G  C  G,  is  a  collection  of  line  segments  v,  Vj  in  R  such  that 

( Vi,Vj )  6  C. 
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Definition  16.  The  outer  boundary  <9(R)  is  the  union  of  line  segments  in  the  projection  of  G,  which  encloses  all 
the  nodes.  The  region  of  deployment  R  is  the  region  enclosed  by  <9(R). 


Throughout,  we  use  Vi  to  denote  the  node  with  index  i  and  also  its  position  in  R.  We  assume  that  we 
neither  have  any  localization,  i.e.,  no  global  co-ordinates  for  the  nodes  are  known,  nor  any  global  orientation 
information.  We  however  assume  the  distances  between  the  nodes  (the  lengths  of  edges  in  the  communication 
graph)  is  known  to  us.  This  information  can  be  obtained  from  several  techniques  such  as  Received  Signal  Strength 
(RSS)  estimation,  time  difference  of  arrival  (TDoA),  etc.  [106] 

Systematic  Failure  In  the  work  presented  here,  we  endeavor  to  track  such  failures  only  using  the  communication 
graph  and  its  edge  lengths.  The  resolution  of  the  regions  we  are  able  to  detect  is  restricted  by  the  node  density.  As 
such,  we  require  the  region  of  systematic  failure  to  be  “big  enough”  relative  to  the  communication  radius,  so  as 
to  be  detected,  and  hence  tracked.  We  now  make  the  above  description  of  the  systematic  failures  mathematically 
precise. 

Definition  17.  For  a  time  evolving  smooth  and  simple  curve  C(s,  t)  :  5  x  R  — >  R,  a  systematic  failure  is  defined 
to  be  occurring  if 


1.  Vi  >  0,  Vi  6  C(s,  t)  =>■  Vi  fails  at  time  t, 

2.  C(s,t) /d(R)  is  closed, 

3.  Bio,  such  that  Vi  >  to,  we  can  inscribe  a  circle  of  radius  rc/s/3  in  the  region  enclosed  by  C(s,  t)  (when 
C(s ,  t)  is  closed),  or  by  C(s ,  t)  and  <9(R)  (when  C(S,  t )  is  open  and  intersects  the  boundary), 

4.  The  area  enclosed  by  C(s,  t )  (when  C(s ,  t)  is  closed),  or  by  C(s,  t )  and  <9(R),  is  increasing  with  time. 


The  time  evolution  of  the  curve  C(s,  t)  can  be  specified  by  assigning  a  velocity  vector  at  each  point  of  the 
curve  in  a  normal  direction.  We  consider  the  vector  only  in  the  normal  direction,  as  any  horizontal  component 
will  only  result  in  a  reparameterization  of  the  curve.  Therefore,  the  boundary  of  any  systematic,  time  evolving 
failure  region  can  be  described  by  a  time  evolving  curve  given  as: 


dC(s ,  t) 

at 


=  —v(s,  t)n, 


(66) 


where  v(t,  s)  is  a  scalar  specifying  the  speed  with  which  the  curve  is  expanding  at  parameter  s  and  time  t. 

In  contrast  to  a  systematic  failure,  nodes  may  fail  “randomly”,  in  which  case,  the  positions  of  the  failed  nodes 
do  not  necessarily  lie  in  a  region  which  can  be  expressed  by  above  properties.  In  fact,  when  nodes  fail  randomly, 
their  positions  arc  usually  uncorrelated  in  space  or  time. 


Algorithm  Overview  Our  process  of  detecting  and  tracking  the  systematic  failure  has  the  following  steps: 


1.  compute  local  coordinates 

2.  compute  restricted  delaunay  triangulation  and  its  boundary 

3.  compute  tight  subgraphs 
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(a)  subgraph  induced  on  G  by  a  node  and  its 
neighbors 


(bj  subgraph  as  realized  according  to  local  co¬ 
ordinates 


Figure  46:  Comparison  of  original  subgraph  (in  global  coordinates)  with  the  one  realized  using  local  coordinates. 
The  subgraph  is  induced  by  a  node  (shown  as  red  block)  and  its  1-hop  neighbors. 

4.  identify  the  tight  subgraphs  which  enclose  failed  nodes  (active  contours) 

5.  detect  systematic  failures  and  compute  the  speed  of  propagation 

Local  coordinates  At  each  node  v^,  given  the  subgraph  Gt  C  G  induced  by  the  nodes  v%  U  N{,  and  the  distance 
function  Di  indicating  the  lengths  of  edges  in  G',,  we  compute  of  the  local  coordinates  /q  :  Vi  U  N{  — »  M2.  The 
local  coordinates  are  computed  such  that  || n(vj)  —  fi(vk)\\  =  Di(j,k),\/(vj,Vk)  G  G{.  Localization  in  sensor 
networks,  is  a  well  researched  subject,  with  several  distributed  algorithms  presented  in  the  literature  [107], [108]. 
Unlike  network  localization  problems  seeking  to  compute  global  coordinates  for  the  nodes,  we  are  primarily 
concerned  with  coordinates  locally  at  each  node  and  its  neighbors.  Table  4  describes  the  algorithm  for  computing 
local  coordinates  and  Figure  46  shows  an  example  application  of  this  algorithm. 

Restricted  Delano  Triangulation  Restricted  Delaunay  Triangulation  ( RDT(G ))  [109]  has  the  following  proper¬ 
ties  which  are  useful  in  our  context. 

1.  RDT(G)  is  locally  similar  to  DT,  in  that  it  is  planar,  and  the  circumcircle  of  a  triangle  does  not  contain 
any  other  node. 

2.  Denote  by  UDel(G ),  the  graph  obtained  by  removing  all  the  edges  of  length  greater  then  rc  from  DT. 
RDT(G )  satisfies  the  following  set  inequality: 

UDel(G)  C  RDT(G)  C  G 

A  distributed  algorithm  for  computing  RDT(G)  was  given  in  Chapter  3  of  the  thesis  [1 10].  An  important 
aspect  of  this  algorithm  in  the  context  of  distributed  processing  is  that,  under  dynamic  conditions,  when  a  set  of 
nodes  fail,  the  RDT(G)  only  needs  to  be  recomputed  locally.  It  is  also  shown  that  the  communication  complexity 
(total  number  of  messages  broadcasted  over  all  the  nodes)  is  of  the  order  0(\/n  log  n).  Given  a  triangulation  on 
a  manifold,  the  boundary  of  the  manifold  may  be  found  by  looking  at  the  edges  in  the  triangulation,  which  are 
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Table  4:  Algorithm  for  computing  local  coordinates  //,  at  vt 

set  m(vi)  =  (0,0) 

if  there  exists  a  4-clique  including  vr 
find  the  most  robust  4-clique 
else  if  there  exists  a  3-clique  including  Vi 
select  a  3-clique 

else 

distribute  the  nodes  evenly  around  v% 
return 

compute  coordinates  for  the  nodes  in  the  clique  selected 
put  the  collected  clique  nodes  into  poolNodes 

while  nodes  remaining  with  no  coordinates  { 

i  f  there  exists  a  node  with  3  links  into  the  poolNodes 
triangulate  its  position 

else  if  there  exists  a  node  with  2  links  into  the  poolNodes 
choose  the  node  with  smallest  distance  from  node  vl 
fix  the  position  which  conforms  with  the  local  adjacency 

else 

find  a  position  which  conforms  with  the  local  adjacency 
put  the  nodes  with  computed  coordinates  into  poolNodes 

} 
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Figure  47:  The  boundary  of  the  restricted  delaunay  triangulation  of  G. 


(a)  (b) 

Figure  48:  dividing  a  connected  component  in  d(RDT(G ))  into  tight  subgraphs. 

faces  of  only  1  triangle.  The  boundary  of  RDT(G)  gives  a  close  approximation  to  the  boundary  of  the  network,  a 
part  of  which  may  potentially  describe  the  propagating  failure  front.  Figure  47  shows  RDT(G)  and  the  boundary 
(in  red). 

Tight  subgraphs.  To  “tightly”  surround  a  region  which  might  potentially  be  the  systematic  failure  region,  we  need 
to  further  partition  d{RDT(G)).  Given  a  subgraph  with  loops  and  handles  as  shown  in  Figure  48(a),  our  goal 
is  to  divide  this  subgraph  into  smaller  loops  and  handles  as  shown  in  Figure  48(b).  ote  the  outermost  boundary 
of  d(RDT(G ))  encloses  all  the  nodes  and  does  not  give  us  any  information  about  the  “holes”.  Therefore,  we 
remove  the  outermost  boundary.  The  handles  in  the  figure  may  be  caused  if  the  failure  region  is  near  the  boundary. 
In  all  subsequent  discussions,  we  denote  d(RDT(G))  to  be  the  boundary  with  the  outermost  boundary  removed. 
The  Algorithm  given  in  Table  5  computes  tight  subgraphs. 

Active  Contours.  We  need  a  test  which  identifies  whether  a  given  tight  subgraph  is  an  active  contour  (i.e.,  it 
encloses  failed  nodes).  The  main  idea  is  to  test  whether  the  failed  nodes  being  observed  by  the  nodes  on  a  tight 
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Table  5:  Algorithm  for  finding  tight  subgraphs  in  d (RDT(G)) 


start 

select  a  node  Vi  with  degree  2 

send  &  forward  packet  ( i ,  i,  1)  to  one  of  its  neighbors 
when  a  node  0}.  f  v,  receives  a  forward  packet  (i,  j,  hi(j ))  from  node  re¬ 
store  in  table  ( i,j ,  hfij)) 

broadcast  to  all  nodes  in  N k  \  j,  the  packet  (i,  k.  hfij )  +  1) 
when  a  node  Vk  f  v,  receives  a  reverse  packet  ii) 
assign  Vk  —>  comp(i) 
find  j  =  arg  rriirij  ht  (  j ) 
transmit  to  j,  the  reverse  packet  ii) 
at  node  vt 

wait  to  receive  all  the  forward  packets  (■ i,j ,  hfij) 
if  received  any  forward  packets 
find  j  =  arg  min  j  hfij) 
send  the  reverse  packet  (i)  to  j 

else 

find  the  shortest  path  p  between  the  leaves 
\/vk  G  p,  assign  Vk  —>  comp{i ) 


subgraph  arc  located  in  the  region  separated  by  the  subgraph.  When  a  failed  node  does  not  share  such  a  region 
with  any  active  node,  we  say  that  the  failed  node  lies  in  the  region  being  separated  by  the  subgraph.  The  theorem 
below  summarizes  this  idea. 

Definition  18.  We  define  a  total  cyclic  ordering  o  on  the  neighbors  TVj  of  vt,  as  the  order  in  which  the  nodes 
appear  either  in  clockwise,  or  in  counterclockwise  direction. 

Theorem  19.  Let  vj  denote  a  failed  node.  A  subgraph  C  C  RDT[G)  which  has  the  following  properties 

1.  3vi  €  C,  with  degp,(vi )  =  2,  such  that  vf  G  N^iDT , 

2.  \/vi  G  C,  3  an  active  node  vaj  G  N^DT  where  vaj  £  C, 

3.  C  is  a  modified  tight  subgraph 

is  an  active  contour  enclosing  Vf  if  and  only  if 

1.  The  edges  in  C  are  faces  of  at  most  1  triangle 

2.  Mvi  G  C  with  degpfvi)  =  2,  there  does  not  exist  an  ordering  of  the  form  Vj  o  Vf  o  vaj  o  Vk  «  ■  •  • ,  where 
vj ,  Vk  G  C  fl  N^dt  and  vaj  G  N^DT . 
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Velocity  Estimation.  To  estimate  the  normal  direction  n(s,  t)  and  the  speed  v(s,  t),  locally,  we  approximate  the 
curve  as  a  straight  line.  Specifically,  for  an  observing  node  let  the  normal  to  the  curve  at  so  pass  through  vt. 
In  the  vicinity  of  Vi,  we  approximate  C{s,  t)  as  its  tangent  line  at  sq.  The  problem  can  then  be  restated  as  an 
estimation  of  the  direction  6  which  the  normal  line  makes  at  so  with  the  x—  axis  in  the  local  coordinate  system 
//,  and  the  speed  i'(so).  Denote  the  unit  vector  in  the  direction  of  the  normal  line  at  so  (direction  of  arrival)  as 
fir  =  (a,  b).  The  computations  for  estimating  the  velocity  are  performed  at  one  of  the  surviving  neighbors  of 
Vi,  when  the  node  vt  fails.  For  each  node  v3  in  the  set  of  neighboring  failed  nodes  N-  C  Nt,  denote  the  local 
failed  time  as  tj  =  rj  —  r/.  As  we  do  not  assume  global  time  synchronization,  only  the  local  failure  times  tj  are 
available  to  us.  The  relationship  between  the  positions  of  the  failed  nodes,  the  direction  of  arrival,  and  the  speed, 
can  then  be  given  locally  as: 


rii  ■  Vj  +  vt j  =  0  =^-  axj  +  byj  +  vtj  =  0  (67) 

In  the  case  of  a  systematic  failure.  Equation  (67)  will  be  satisfied  for  all  failed  nodes  exactly,  and  the  normal 
direction  with  the  speed  can  be  computed  by  solving  a  system  of  three  linear  equations.  In  a  practical  scenario,  we 
may  have  random  failures  of  some  of  the  nodes  and  errors  in  the  recorded  failure  times.  Therefore,  we  formulate 
the  estimation  of  the  velocity  as  the  following  optimization: 

{rii,  v)  =  (a,  b,  v)  =  argmin/(a,  b,  v)  =  argmin  ( axj  +  byj  +  vtj)2  (68) 

few/ 

with  the  following  constraint 


h  =  a2  +  b2  -  1  =  0.  (69) 

The  above  optimization  problem  has  a  closed  form  solution.  Figure  49  shows  some  simulations  demonstrat¬ 
ing  the  accuracy  and  robustness  of  the  above  method. 

5.4.3  Main  Results  on  Network  Classification  and  Failure  Detection 

In  this  part,  we  highlight  our  experimental  results  on  network  classification  and  synthesis  as  well  as  detection 
results  of  systematic  failures. 

Network  synthesis.  First  we  present  the  experimental  results  on  network  synthesis.  In  the  first  synthesis  exper¬ 
iment  for  our  new  algorithm,  the  total  time  step  is  3000  and  the  initial  number  of  nodes  is  4.  As  shown  in 
Figure  50,  the  experimental  results  demonstrate  that  the  corresponding  degree  distribution  of  such  a  model  still 
satisfies  a  power-law  distribution,  indicating  that  the  degree  distribution  is  a  feature  that  all  resulting  networks  do 
share  in  our  new  model,  as  in  Barabasi-Albert  model. 

We  also  record  the  changes  of  the  statistics  of  clustering  coefficients  from  time-step  1500,  when  the  changes 
of  statistics  of  clustering  coefficients  starting  to  be  very  small,  i.e.,  when  the  statistics  has  stabilized  in  the 
network.  Another  experiment  with  the  same  setting  but  for  Barabasi-Albert  model  is  run,  and  we  record  the 


102 


Error  in  speeds  computed 


(a)  Error  in  speeds  computed  vs  error  in  failure  times 


Error  in  speeds  computed 


Error  in  the  angles  computed 


(b)  Error  in  angles  computed  vs  error  in  failure  times 


std  of  noise  in  positions 


(c)  Error  in  speeds  computed  vs  error  in  node  positions  (d)  Error  in  angles  computed  vs  error  in  node  positions 

Figure  49:  Robustness  of  the  velocity  estimation.  The  error  in  velocity  estimated  is  small,  when  the  errors  in 
failure  times  and  positions  are  small,  and  has  a  graceful  degradation. 


103 


changes  of  statistics  of  clustering  coefficients  as  well.  We  calculate  and  compare  the  standard  deviation  of  the 
changes  for  both  experiments,  which  are  normalized  by  the  values  of  the  statistics  at  the  last  time-step.  The 
result  is  shown  in  Table  6.  The  changes  of  the  statistics  of  clustering  coefficients  in  the  network  generated  by  the 
algorithm  based  on  the  generalized  Markov  Graph  model  are  much  smaller  than  those  by  Barabasi-Albert  model, 
which  indicates  that  our  new  model  could  generate  a  network  with  a  much  faster  stabilizing  clustering  coefficient 
distribution  than  Barabasi-Albert  model  does. 


Figure  50:  Degree  distribution  of  the  network  evolving  according  to  the  rules  that  satisfy  the  properties  of  the 
generalized  Markov  Graph  model. 

To  further  validate  that  the  algorithm  based  on  the  generalized  Markov  Graph  model  outperforms  Barabasi- 
Albert  model  on  generating  networks  with  more  stable  clustering  coefficient  distributions,  we  repeat  the  previous 
experiment  1000  times  for  both  Barabasi-Albert  model  and  our  new  algorithm.  Table  7  shows  the  mean  and 
variance  of  the  selected  statistics  (mean,  variance,  skewness  and  kurtosis)  of  the  clustering  coefficient  distribu¬ 
tions  for  both  models.  Compared  to  their  ranges  [0, 1]  (as  any  clustering  coefficient  is  limited  between  0  and  1), 
the  variance  of  the  mean  and  the  variance  of  clustering  coefficients  for  the  new  model  is  very  small.  This  veri¬ 
fies  that  the  clustering  coefficient  distribution  is  a  common  feature  of  all  networks  generated  by  our  new  model. 
Furthermore,  in  Table  7  the  ratio  of  the  variance  to  the  mean  of  almost  all  the  statistics  of  the  clustering  coeffi¬ 
cient  distribution  for  the  new  model  is  much  smaller  than  that  for  Barabasi-Albert  model.  This  indicates  that  the 
new  model  could  generate  networks  that  have  more  stable  clustering  coefficient  distribution  than  Barabasi-Albert 
model  does. 


In  summary,  the  algorithm  based  on  the  generalized  Markov  Graph  model  can  generate  networks  that  follow 
a  certain  degree  distribution  together  with  a  more  stable  and  faster  stabilizing  clustering  coefficient  distribution 
than  Barabasi-Albert  model  does.  This  is  not  very  surprising  in  light  of  the  generalized  Markov  Graph  model. 
The  algorithm  itself  relies  on  two  features  of  the  generalized  Markov  Graph  model,  namely  the  degree  distribution 
and  the  clustering  coefficient  distribution.  To  simulate  a  network  with  N  nodes,  the  complexity  of  this  algorithm 
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models 

Barabasi- Albert 

model 

generalized 
Markov  Graph 

model 

mean 

0.0022 

1.97e-4 

var 

0.0057 

7.43e-4 

skewness 

0.0069 

0.0038 

kurtosis 

0.0221 

0.0012 

Table  6:  Standard  deviation  of  the  change  of  the  statistics  of  clustering  coefficients 


Algorithm 

Preferential  Attachment  Algorithm  from 

Barabasi-Albert  model 

Algorithm  based  on  generalized 
Markov  Graph  model 

Statistics  of  distribution 

mean 

var 

var/mean 

mean 

var 

var/mean 

mean 

0.0107 

8.96e-4 

8.37% 

.2279 

.0011 

0.483% 

var 

0.0013 

1.507e  —  4 

11.59% 

0.0263 

3.089e  —  6 

0.012% 

skewness 

4.83 

0.386 

7.99% 

0.6340 

0.09 

14.19% 

kurtosis 

33.8 

8.12 

24% 

3.2171 

0.3504 

10.89% 

Table  7:  Statistics  of  clustering  coefficients  of  networks  generated  by  both  models. 


is  0(N 4),  which  is  the  same  as  Barabasi-Albert  model.  The  computational  cost  of  this  algorithm  is  twice  that  of 
Barabasi-Albert  model:  both  the  degree  list  of  the  whole  network  as  well  as  the  number  of  triads  at  each  node 
have  to  be  calculated  in  the  new  algorithm  while,  in  Barabasi- Albert  model,  only  the  degree  list  needs  to  be 
calculated. 

Detection  of  systematic  failures.  An  active  contour  is  a  potential  candidate  for  representing  a  systematic  failure, 
as  it  represents  spatially  correlated  failures.  However,  it  might  also  be  surrounding  randomly  failed  nodes.  In 
order  to  accurately  detect  a  “systematic”  failure,  we  also  need  to  test  for  temporal  correlation,  which  can  be 
summarized  in  the  following  two  properties 

1 .  Causality  in  failure  times 

2.  Consistency  in  speeds 

By  causality  we  mean,  that  when  seen  in  the  direction  of  arrival  of  the  propagating  failure,  the  nodes  farther 
away  (or  are  well  inside  a  failure  region)  should  have  failed  before  the  nodes  which  arc  closer  to  the  boundary. 
The  optimization  problem  described  earlier,  estimates  both  speed  ut  and  direction  n,  of  propagation  at  a  failed 
node  Vi.  The  directional  distance  of  a  failed  node  Vj,  in  the  direction  of  the  propagating  failure  may  be  determined 
as  the  dot  product  <  Vj  ■  Hi  >,  using  the  local  coordinates  at  node  vt .  The  problem  then  reduces  to  checking  if 
the  order  of  failure  times  is  the  same  as  that  of  the  directional  distances.  If  the  order  is  not  the  same,  the  causality 
condition  is  not  satisfied,  and  we  consider  the  failed  node  V{  as  a  randomly  failed  node.  If  all  the  failed  nodes  be¬ 
ing  observed  by  the  active  contour  fail  the  causality  test,  then  the  active  contour  is  considered  non-representative 
of  a  systematic  failure. 
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We  also  assume  that  in  a  systematic  failure,  the  speed  of  propagation  does  not  change  suddenly  in  any  given 
direction,  a  property  we  call  consistency  in  speeds.  We  test  for  consistency  in  speeds  as  follows.  Consider  a  node 
Vi  which  fails  at  time  t  and  is  enclosed  by  an  active  contour,  and  v%  estimates  the  speed  of  propagation  as  vt  (note 
that  this  computation  is  performed  at  an  active  node).  Let  vac  be  the  node  on  the  active  contour  which  is  closest 
to  Then,  if  the  node  vac  is  still  active  at  time  t  +  rcJ v%,  we  declare  that  the  failure  of  node  vl  is  not  systematic. 
If  all  the  failed  nodes  being  observed  by  the  active  contour  fail  the  consistency  test,  then  the  active  contour  is  not 
considered  to  be  representing  a  systematic  failure. 


5.4.4  Tracking  Systematic  Failures 

Sensor  networks  are  ideal  when  deployed  to  monitor  hazardous  environments  such  as  volcanic  eruptions,  wild 
fires,  land  slides,  war  zones  etc.  A  common  feature  among  these  environments  is  that  they  produce  events 
which  cause  correlated  failures  in  both  space  and  time,  which  we  call  systematic.  Being  able  to  detect  and  track 
such  failures  becomes  crucial  both  for  the  sake  of  tracking  itself  and  for  any  emergency  response  thereafter. 
The  problem  of  tracking  systematic  failures  is  exacerbated  when  some  of  the  nodes  fail  randomly  (due  to  some 
malfunction),  which  is  the  case  considered  here. 

As  adopted  in  this  paper,  a  natural  methodology  for  tracking  such  systematic  phenomenon  is  based  on  fast 
and  efficient  identification  of  boundaries.  We  would  expect  any  algorithm  performing  this  task  to  include  the 
following  important  properties:  1)  the  boundary  output  is  geometrically  close  to  the  actual  boundary,  and  2)  the 
interior  of  the  boundary  is  topologically  faithful  to  the  original  space.  It  is  often  the  case  that  we  are  only  given 
random  samples  from  the  space.  We  may  then  reconstruct  the  space  by  first  placing  balls  of  a  certain  radius 
around  these  points,  and  then  by  taking  the  union  of  these  balls. 


Figure  51:  The  systematic  failure  tracking  in  done  in  two  steps,  (1)  updating  the  boundary  of  the  network  at  a 
specific  time  as  shown  in  (a),  and  (2)  segmenting  the  boundary  and  identifying  the  parts  surrounding  a  systematic 
failure. 


Problem  formulation  We  consider  a  compact  region  R  C  M2,  in  which  a  set  of  nodes  V  are  deployed.  For  a 
communication  radius  rc,  a  communication  graph  G  =  (V,  E)  is  induced  on  V  where  (v\ .  vf)  e  E  d\->  = 
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I  (ui,  vf)  |  <  rc.  An  edge  ( v;  1 .  vf)  in  G  implies  that  v \  is  within  the  communication  region  (and  vice-versa)  of 
V2,  or  in  other  words,  can  be  observed  by  v? .  For  a  node  v%  <G  G,  (or  equivalently,  vt  G  V),  we  denote  by  Ay, 
the  neighbors  of  vt  in  G.  Given  this  model  for  the  communication  graph,  the  notion  of  the  deployment  region  R 
may  be  made  precise  as  follows. 

Definition  19.  The  projection  C'  of  a  subgraph  G  C  G,  is  a  collection  of  line  segments  vjvj  in  R  such  that 

(' Vi,vj )  G  C. 

Definition  20.  The  outer  boundary  <9(R)  is  the  union  of  the  minimum  number  of  line  segments  in  the  projection 
ofG,  which  encloses  all  the  nodes.  The  region  of  deployment  R  is  the  region  enclosed  by  <9(R). 

We  define  the  communication  region  'R.c  as  the  union  of  balls  of  radius  rc/ 2,  with  centers  as  the  nodes  in  V. 
Note  that  when  TZC  is  path  connected,  the  communication  graph  is  a  connected  graph. 

Throughout,  we  use  Vi  to  denote  the  node  with  index  i  and  also  its  position  in  R.  We  assume  that  we  have 
neither  localization,  i.e.,  no  global  co-ordinates  for  the  nodes  are  known,  nor  global  orientation  information.  We, 
however,  assume  the  distances  between  the  nodes  (the  lengths  of  edges  in  the  communication  graph)  are  known. 

We  next  make  the  above  description  of  the  systematic  failures  mathematically  precise. 

Definition  21.  For  a  time  evolving  smooth  and  simple  curve  C(s,  t)  :  S  x  R  — ►  R,  a  systematic  failure  is  defined 

if 

1.  Vt  >  0,  Vi  €  C(s ,  f)  =>  Vj.  fails  at  time  t, 

2.  C(s,t) /d(R)  is  closed, 

3.  the  region  enclosed  by  C(s,  t )  ( when  C(s,  t)  is  closed),  or  by  C(s ,  t)  and  <9(R)  ( when  C(s,  t )  is  open  and 
intersects  the  boundary),  is  large  enough  to  create  a  hole  in  TZC. 

4.  The  area  enclosed  by  C(s,  t)  ( when  C(s ,  t)  is  closed),  or  by  C(s,  t)  and  <9(R),  is  increasing  with  time. 


The  time  evolution  of  the  curve  C(s,  t)  can  be  specified  by  assigning  a  velocity  vector  at  each  point  of  the 
curve  in  a  normal  direction.  We  only  consider  the  vector  in  the  normal  direction,  as  any  horizontal  component 
will  only  result  in  a  reparameterization  of  the  curve.  The  boundary  of  any  systematic,  time  evolving  failure  region 
can  therefore  be  described  by  a  time  evolving  curve  given,  that  is  given  as: 


dC(s,  t ) 

at 


=  —u(s,  t)n, 


(70) 


where  iAt.  s )  is  a  scalar  specifying  the  speed  the  curve  is  expanding  at,  for  parameter  s  and  time  t,  and  n  is  the 
normal  vector. 


As  shown  in  Figure  51(b),  we  correctly  identify  the  segment  of  the  boundary  surrounding  the  failure  in  the 
presence  of  random  failures,  using  just  the  local  orientation  information.  The  speed  of  evolution  can  then  be 
estimated  using  an  optimization  function  which  has  a  closed  form  solution,  and  which  is  quite  robust  to  noise  as 
shown  in  Figure  49.  Parts  of  this  work  appeared  in  [1 1 1],  and  in  [1 12],  and  the  complete  version  of  the  journal 
paper  is  currently  under  review.  A  draft  is  available  online  at  [1 13]. 
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Summary.  The  tracking  of  systematic  failure  is  divided  into  two  steps,  1)  Fast  and  localized  estimation  of 

network  boundary  at  each  time  step,  and  2)  Robust  estimation  of  segment  of  the  boundary  surrounding  the  failure 
in  the  presence  of  noise,  and  estimating  the  speed  of  evolution.  We  use  a  geometric  object  called  an  o-shapc  to 
define  the  boundary  of  the  network,  and  argue  for  why  such  a  definition  is  relevant  in  the  context  of  failures.  In 
addition,  we  also  develop  a  fast,  localized  and  coordinate  free  algorithm  for  computing  the  o-shapc,  which  has 
interesting  implications  on  its  own.  Figure  51(a)  shows  the  o-shapc  of  a  network.  Observe  that  it  is  topologically 
faithful  to  the  boundary  of  the  communication  region.  As  a  by-product  of  our  analysis,  we  also  develop  a 
mathematical  framework  which  unifies  several  boundary  definitions  adopted  in  the  literature. 

5.4.5  Relevance  to  Original  Goals 

To  facilitate  classification  of  networks  and  identify  anomalous  networks,  we  introduced  a  generalized  Markov 
Graph  model  and  its  application  to  social  network  classification  and  social  network  synthesis.  The  main  result 
of  the  generalized  Markov  Graph  model  is  that  the  degree  distribution,  the  clustering  coefficient  distribution,  and 
the  crowding  coefficients  arc  three  fundamental  statistics  for  characterizing  generic  social  networks.  In  addition, 
the  generalized  Markov  Graph  model  provides  a  new  insight  into  clustering  coefficient:  it  is  the  result  of  the 
dependence  between  higher  order  structures,  namely  the  triads,  in  social  networks. 

To  detect  and  track  systematic  failures  in  networks.  We  have  presented  a  distributed  algorithm  to  track  a 
systematic  failure  in  sensor  networks  which  is  robust  to  random  failures.  The  algorithm  presented  has  a  low 
communication  complexity,  and  requires  a  small  number  of  local  communications  which  makes  it  ideal  for  real 
time  applications.  We  assume  neither  any  global  coordinate  information  nor  any  capability  of  directly  sensing  the 
phenomenon  causing  the  failure.  We  presented  precise  mathematical  formulations  of  the  algorithms  along  with 
proofs  for  their  correctness.  The  simulations  performed  demonstrate  the  accuracy  and  robustness  of  our  methods. 

6  Cascading  Failures  in  Power  Grids  due  to  Communication  and  Cyber  Attacks 

The  second  issue  that  our  research  explores  is  the  formation  and  properties  of  correlated  failures  in  communi¬ 
cation  networks.  Following  a  WMD  event,  one  or  multiple  failures  can  be  identified.  Unfortunately,  this  is  not 
the  end  of  story:  the  intrinsic  nature  of  networking  and  communication  will  impulsively  surrender  a  communi¬ 
cation  network  vulnerable  to  these  failures,  and  further  create  more  and  more  failures,  which  is  recognized  as 
an  epidemic  and  detrimental  phenomena  in  every  inter-dependent  architecture,  including  social  networks  and 
transportation  systems.  The  majority  of  existing  studies  address  specific  failures  as  a-priori  knowledge  and  de¬ 
sign  their  countermeasures.  However,  there  arc  very  few  studies  on  the  evolution  process  of  these  failures,  and 
on  the  extent  of  damage  to  network  composition  and  structure.  Therefore,  this  is  the  first  effort  in  modeling 
and  exploring  the  impact  of  correlated  failures,  which  would  greatly  contribute  to  the  failure  resilience  of  large 
networks. 

By  analyzing  the  correlated  failures,  we  modeled  the  impact  of  traffic  overloading.  To  further  study  the 
performance  impact  of  failures,  we  need  address  the  performance  of  cyber-physical  attacks  with  new  metrics. 
More  over,  we  aime  to  study  the  scale  properties  of  attacks.  The  detection  and  localization  of  failures  provide  the 


108 


information  of  where  a  failure  occurs  and  what  its  impact  is  on  the  topological  composition.  However,  failures 
in  the  same  location  may  have  varying  impact  because  of  their  attack  methods.  We  seek  answers  to  questions 
like  1)  what  kind  of  properties  should  an  attack  have  in  order  to  make  a  severe  damage  to  the  communication 
networks?  2)  Are  there  any  asymptotic  bounds  of  an  attack  in  temporal  domain?  3)  What  is  the  scaling  law  in 
large-scale  networks? 

6.1  Performance  Impact  of  Cyber  Attacks 

6.1.1  Objectives  and  Approaches 

The  cyber-physical  attacks,  not  only  sabotage  the  vulnerable  or  impaired  nodes,  but  also  reproduce  themselves 
to  propagate  the  damage  epidemically.  On  the  other  hand,  the  damage  to  communication  links,  or  physical  in 
contrast  to  cyber  attacks,  may  be  easily  detected.  However,  it  is  a  challenging  issue  to  assess  the  impact  of  such 
destruction.  However,  to  quantify  the  network  vulnerability  in  tempo-spatial  domain,  the  widely  used  performanc 
e  metrics,  such  as  throughput  or  delay  cannot  not  illustrate  the  impact  of  failures.  Therefore,  we  need  to  propose 
new  metrics,  and  more  important  a  new  method  to  study  the  problem. 

Our  methodology  is  to  study  the  gain  that  a  misbehaving  node  can  obtain  via  two  general  classes  of  backoff 
misbehavior.  The  first  class  is  called  continuous  misbehavior ,  which  performs  misbehavior  persistently  and  does 
not  stop  until  it  is  disabled  by  countermeasures.  The  second  class  is  called  intermittent  misbehavior,  which 
in  contrast  to  continuous  misbehavior,  performs  misbehavior  in  on  periods  and  returns  to  be  legitimate  in  off 
periods.  The  goal  of  intermittent  misbehavior  is  to  obtain  benefits  over  legitimate  nodes  and  at  the  same  time  to 
evade  the  misbehavior  detection.  Then,  we  define  both  legitimate  users  and  misbehaving  users  formally  in  the 
sense  that  their  attack  time  and  selection  of  attacking  are  represented  by  mathematical  models. 

One  of  the  main  contributions  of  our  work  is  to  define  a  new  metric,  namely,  “order  gain”  to  quantify  the 
benefits  of  backoff  misbehaving  nodes,  or  the  damage  to  legitimate  users.  Then  based  on  this  measure,  we 
examine  the  responses  of  different  attack  models  and  identify  the  most  harmful  attack  models  for  which  the 
countermeasures  need  to  be  designed. 

Main  Results:  Our  contributions  in  the  study  of  A  new  metric,  order  gain,  is  defined  to  measure  the  performance 
benefits  of  misbehaving  nodes  over  legitimate  nodes,  which  is  helpful  in  evaluating  the  gain  and  impact  of  a 
misbehaving  node  in  a  CSMA/CA-based  wireless  network.  We  find  that  the  order  gain  of  a  continuous  double¬ 
window  backoff  misbehaving  node  always  converges  to  log2{p/pD)  as  t  —>  oo,  where  p  and  pD  are  the  collision 
probabilities  of  legitimate  and  misbehaving  nodes,  respectively.  While  the  order  gain  of  a  continuous  fixed- 
window  backoff  misbehaving  node  is  an  increasing  function  to  infinity  as  t  — >  oo.  We  also  find  that  the  order 
gain  of  the  intermittent  misbehaving  node  always  converges  as  t  — >  oo  regardless  of  the  misbehavior  scheme  it 
chooses  in  the  on  state. 

Our  analytical  and  experimental  results  show  that  both  double-window  backoff  misbehavior  and  fixed-window 
backoff  misbehavior  achieve  significant  gains  when  the  number  of  users  is  small.  However,  double-window  back¬ 
off  misbehavior  is  more  sensitive  to  the  number  of  users  and  has  marginal  gains  as  the  number  of  users  increases. 
Thus,  the  number  of  users  can  be  considered  as  an  evaluating  factor  for  the  deployment  of  a  counter-strategy  in  a 
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Figure  52:  A  single  transmission  in  a  simple  slotted  CSMA/CA  network 


wireless  network.  We  also  find  that  an  intermittent  misbehaving  node  can  not  achieve  substantial  gain  by  setting 
a  short  on  period  to  perform  misbehavior.  Thus,  even  an  intermittent  misbehaving  node  may  evade  misbehavior 
detection,  it  can  not  cause  significant  damage  to  a  wireless  network. 


6.1.2  What  is  Order  Gain? 

The  benefits  of  misbehaving  nodes  can  be  either  gaining  more  resources  for  selfish  nodes  or  degrading  network 
performance.  In  the  first  case,  a  selfish  node  tries  to  acquire  a  higher  chance  to  access  the  channel  than  legitimate 
nodes,  which  is  quite  easy  in  operation.  Flowever,  the  effect  of  misbehaving  can  be  devastating,  and  therefore, 
it  is  one  of  the  earliest  and  most  well  studied  issue  for  wireless  access  networks  [?,  ?,  ?,  ?].  In  the  second  case, 
the  goal  of  malicious  nodes  is  to  disrupt  normal  network  operation.  Such  nodes  are  often  referred  to  as  jammers 
[?,  ?].  In  this  work,  we  focus  on  the  first  case  that  in  fact  would  evolution  into  the  second  case  under  condition. 

The  network  performance,  on  the  other  hand,  can  be  evaluated  by  a  number  of  metrics,  such  as  throughput, 
which  can  be  the  data  transmission  rate  of  one  user,  or  aggregated  rate  of  a  group  of  users.  There  have  been 
many  works  on  throughput  analysis  of  CSMA/CA  networks,  such  as  [?]  and  [?].  By  taking  a  close  look,  we  can 
find  that  many  analysis  are  based  on  the  waiting  time  and  transmission  time.  For  example,  Figure  52  illustrates 
a  simple  example  of  a  transmission  in  a  slotted  CSMA/CA  network.  During  the  transmission  in  Figure  52,  the 
throughput  can  be  computed  as  rj  =  transmission  time/(waiting  time  +  transmission  time)  =  1/7.  We  can  see 
that  the  throughput  p  is  in  fact  a  consequence  of  waiting  time  that  is  the  number  of  slots  during  the  node  contends 
for  the  channel.  Therefore,  the  waiting  time  can  immediately  represent  the  performance  of  a  node:  the  longer  the 
waiting  time,  the  worse  the  performance,  and  vice  versa.  We  define  the  waiting  time  as  follows. 

Definition  22  (Waiting-time).  Waiting  time  of  a  node  is  the  number  of  slots  between  the  instant  that  the  node 
starts  to  contend  for  the  channel  to  transmit  a  packet  and  the  instant  that  the  node  successfully  transmits  the 
packet;  namely,  the  waiting  time  W  =  V;v  0  T(i),  where  N  is  the  number  of  collisions  before  the  node  makes  a 
successful  transmission,  T(i )  is  the  random  backoff  time  after  the  i-th  collision. 

We  have  shown  that  the  waiting  time  is  essential  to  the  performance  of  a  node.  Flowever,  our  objective  is 
not  to  evaluate  the  performance  of  a  single  node  but  to  understand  benefits  of  backoff  misbehaving  schemes,  that 
is  the  gains  of  misbehaving  nodes  over  legitimate  nodes.  To  achieve  this  goal,  we  will  introduce  a  performance 
metric  by  considering  the  following  constraints: 

•  The  definition  of  the  metric  must  be  generic,  without  depending  on  a  particular  protocol.  This  is  due  to 
the  wide  deployment  of  CSMA/CA  networks,  especially  IEEE  802.11  and  IEEE  802.15.  Therefore,  the 
definitions  of  control  messages,  such  as  RTS/CTS,  ACK  should  not  affect  the  interpretation  of  the  gain. 
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•  The  metric  can  not  be  limited  to  the  first-order  statistics,  which  are  referred  to  as  the  means  of  random  vari¬ 
ables,  such  as  mean  delay.  First-order  statistics  are  in  general  not  able  to  be  extended  to  other  performance 
metrics.  For  example,  the  delay  jitter  [?],  which  is  a  second-order  statistic,  can  not  be  calculated  by  only 
knowing  the  mean  delay. 

•  If  the  gain  of  node  A  over  node  B  is  G \  and  the  gain  of  node  B  over  node  C  is  G2,  then  the  gain  of  node  A 
over  node  C  is  Gj  +6' 2.  This  property  is  very  important  because  it  enables  us  to  quantitatively  compare 
the  impacts  of  two  misbehaving  nodes  by  directly  comparing  their  metrics. 


With  above  consideration,  we  introduce  a  new  metric,  namely  order  gain  of  waiting  time4  as  follows. 


Definition  23  (Order  gain  of  waiting  time).  Let  Wa  and  Wb  be  the  waiting  times  of  nodes  A  and  B,  respectively. 
The  order  gain  of  node  A  over  node  B  is  defined  as 


G(t)  =  log* 


P(VFb  >  t) 

W(WA  >  t)  ’ 


(71) 


where  P  [Wa  >  t)  andFfWs  >  t )  are  the  tail  distribution  functions  (or  complementary  cumulative  distribution 
functions,  CCDFs)  ofWA  and  Wb,  respectively. 


Remark  15.  The  definition  of  order  gain  is  based  on  tail  distribution  functions  of  nodes  A  and  B.  The  tail 
distribution  function,  for  example,  P(ITa  >  t)  denotes  the  probability  that  the  waiting  time  of  node  A  is  greater 
than  a  given  t,  showing  that  how  often  the  waiting  time  of  node  A  is  larger  than  a  given  value.  Thus.  ¥(Wa  >t) 
can  in  fact  indicate  the  performance  of  node  A  since  the  larger  the  waiting  time,  the  less  the  chance  for  the  node 
to  access  to  channel. 


The  most  commonly-used  misbehaving  backoff  schemes  are  double-window  and  fixed-window  misbehavior, 
which  both  belong  to  continuous  misbehavior  and  have  been  extensively  studied  regarding  detection  schemes  [?, 
?]  and  incentive -based  protocols  [?,?].  Therefore,  in  this  section,  we  first  study  the  two  continuous  misbehavior: 
double-window  misbehavior,  which  conforms  to  binary  exponential  backoff  but  chooses  a  smaller  minimum 
contention  window  than  legitimate  nodes,  and  fixed-window  misbehavior,  which  chooses  random  backoff  time 
uniformly  from  a  fixed  range.  Then,  we  move  on  to  the  intermittent  backoff  misbehavior,  in  which  a  misbehaving 
node  performs  misbehavior  and  legitimate  backoff  in  on  state  and  off  state,  respectively. 

We  summarize  the  main  results  on  the  order  gain  of  double-window  misbehavior  as  follows. 

Theorem  20.  The  order  gain  of  a  double-window  backoff  misbehaving  node  over  legitimate  nodes  is 

GD(t)=ic&(£)+e(^)-5 

where  p  and  pD  are  the  collision  probabilities  of  the  legitimate  and  misbehaving  nodes,  respectively. 

4This  is  referred  to  as  order  gain  throughout  this  paper  unless  otherwise  specified. 

3We  say  function  f(x)  is  of  the  same  order  as  function  g(x)  and  write  f(x)  =  &(g(x))  if  and  only  if  there  exist  two  positive  real 
numbers  ci  and  C2  and  a  real  number  xo  such  that  ci|g(a;)|  <  |/(*)|  <  ci\g(x)\  for  all  x  >  xo ■ 
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Figure  53:  Order  gain  of  a  double-window  backoff  misbehaving  node  in  an  802.1 1  network  with  different  num¬ 
bers  of  legitimate  nodes. 

Remark  16.  Theorem  20  shows  that  G n  ( t )  converges  to  log 2(f/fd)  tis  t  — >  oo,  showing  that  the  order  gain 
of  double-window  misbehavior  depends  on  the  ratio  of  the  collision  probabilities  of  legitimate  and  misbehaving 
nodes.  It  has  been  shown  in  [2]  that  the  ratio  p/pD  — >  1  as  the  number  of  nodes  goes  to  infinity  in  a  network, 
which  in  turn  indicates  that  the  performance  gain  of  a  double-window  misbehaving  node  becomes  marginal  as 
the  number  of  nodes  increases. 

Remark  17.  We  also  find  that  the  order  gain  of  fixed-window  backoff  misbehavior  is  an  increasing  function  to 
infinity  ast  oo  regardless  of  the  number  of  nodes  in  the  network,  which  indicates  that  a  misbehaving  node  can 
always  obtain  substantial  benefits  from  fixed-window  backoff  misbehavior.  Thus,  any  countermeasure  to  backoff 
misbehavior  should  consider  fixed-window  backoff  misbehavior  as  its  primary  target. 


6.1.3  Intermittent  Backoff  Misbehavior 


We  have  derived  the  order  gains  of  the  two  widely-used  schemes  for  continuous  misbehavior.  However,  a  misbe¬ 
having  scheme  is  not  always  guaranteed  to  be  continuous,  especially  when  there  exists  a  counter-strategy  in  the 
network  that  tries  to  detect  and  to  disable  any  misbehavior.  It  has  been  shown  in  [?]  that  a  node  that  performs 
misbehavior  intermittently  may  evade  such  misbehavior  detection.  Thus,  it  is  important  to  know  the  gain  of  an 
intermittent  misbehaving  node  in  a  network.  The  backoff  scheme  of  an  intermittent  misbehaving  node  is  defined 
as  a  Markov  process  with  on  and  off  states.  With  this  definition,  we  state  our  result  on  intermittent  misbehavior. 


Theorem  21.  The  order  gain  of  an  intermittent  misbehaving  node  over  legitimate  nodes  satisfies 


Gr{t)  =  log2^  +  0(  r^- 


P, 


1 


In  t 


where  pon  and  poff  are  collision  probabilities  of  legitimate  nodes  in  on  and  off  states,  respectively. 


Theorem  21  shows  that,  perhaps  surprisingly,  the  order  gain  of  an  intermittent  misbehaving  node  Gi(t) 
always  converges  as  t  oo  regardless  of  the  misbehaving  backoff  scheme  in  the  on  state. 


112 


Figure  54:  Order  gain  of  an  intermittent  misbehaving  node  in  an  802. 1 1  network  with  5  legitimate  nodes. 

We  use  ns2  simulations  to  further  assess  the  performance  of  intermittent  misbehavior  by  considering  an 
802.11  network  consisting  of  five  legitimate  nodes  and  one  intermittent  misbehaving  node.  The  intermittent 
misbehaving  node  performs  misbehavior  by  choosing  random  backoff  time  uniformly  from  [0,  7]  when  it  is  on. 
Figure  54  demonstrates  the  order  gains  of  the  intermittent  misbehaving  node  for  different  on-state  ratios  9.  We 
see  from  Figure  54  that  the  order  gain  of  the  misbehaving  node  always  has  an  initial  increasing  phase,  and  after 
reaches  a  maximum,  starts  to  converge  decreasingly,  which  shows  there  exists  a  phase  transition  phenomenon  in 
the  order  gain  of  intermittent  misbehavior.  The  phase  transition  phenomenon  is  more  evident  when  9  becomes 
large.  We  denote  by  t*  the  phase  transition  point,  which  is  the  value  of  waiting  time  corresponding  to  the 
maximum  of  the  order  gain.  During  simulations,  we  find  that  t*  increases  as  9  increases,  but  the  increment  is  not 
significant.  For  example,  in  Figure  54,  t*  increases  from  18  to  33  as  9  goes  from  50%  to  99%. 

Figure  54  also  shows  that  the  order  gain  of  the  intermittent  misbehaving  node  is  not  significant  when  9 
is  small.  For  example,  when  9  =  50%,  the  order  gain  is  always  smaller  than  0.35  and  the  phase  transition 
phenomenon  is  not  evident.  When  9  =  70%,  the  order  gain  is  also  upper  bounded  by  0.6.  Consequently,  our  sim¬ 
ulation  results  indicate  that  if  an  intermittent  misbehaving  node  tries  to  evade  misbehavior  detection  by  choosing 
a  small  9,  its  performance  gain  is  not  significant. 

The  goal  of  an  intermittent  misbehaving  node  is  to  achieve  performance  gain  over  legitimate  nodes  and  at 
the  same  time  to  evade  misbehavior  detection.  We  find  that,  interestingly,  if  an  intermittent  misbehaving  node 
chooses  a  small  9  to  evade  misbehavior  detection,  it  can  not  achieve  substantial  gains.  For  example,  the  order  gain 
is  always  smaller  than  0.35  when  9  =  50%.  On  the  other  hand,  if  an  intermittent  misbehaving  node  chooses  a  large 
9  to  achieve  substantial  gains,  it  may  not  be  able  to  evade  misbehavior  detection  in  that  it  performs  similarly  as  a 
continuous  misbehaving  node.  For  example,  we  can  see  that  in  Figure  54  that  when  the  intermittent  misbehaving 
node  has  0  =  99%,  it  has  the  similar  order  gain  as  the  “100%”  order  gain  when  the  waiting  time  t  is  small,  which 
shows  that  it  has  a  higher  risk  to  be  detected. 
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6.1.4  Discussions  and  Summary 


So  far,  we  have  studied  the  problem  of  quantifying  the  gain  of  backoff  misbehavior  and  obtained  the  order  gains 
for  two  continuous  backoff  misbehavior  schemes  and  the  intermittent  misbehaving  scheme,  which  arc  validated 
by  simulation.  We  further  present  experimental  results  to  illustrate  the  impact  of  backoff  misbehavior.  Our 
findings  can  be  summarized  as: 

1 .  Double-window  misbehavior  is  more  sensitive  to  the  number  of  users  than  fixed- window  misbehavior  and 
can  only  achieve  marginal  gains  when  the  number  of  user  increases,  which  shows  that,  on  the  other  hand, 
the  performance  loss  of  legitimate  nodes  due  to  double-window  misbehavior  is  not  significant  in  a  network 
with  a  large  number  of  users. 

2.  Fixed-window  misbehavior  can  always  achieve  substantial  gains  over  legitimate  nodes  regardless  of  the 
number  of  users.  Therefore,  fixed-window  misbehavior  should  always  be  the  primary  target  of  counter¬ 
measures  to  backoff  misbehavior. 

3.  An  intermittent  misbehaving  node  can  not  achieve  significant  gain  when  it  chooses  a  small  6  to  evade 
misbehavior  detection. 

The  above  results  arc  studied  from  a  “gain”  perspective.  Note  that  the  network  resources  arc  limited  and  finite, 
especially  for  a  number  of  users  sharing  the  same  medium.  In  other  words,  when  some  users  gain  throughput  or 
bandwidth  benefits,  others  can  potentially  lose  their  transmission  opportunity,  resulting  in  zero  user-throughput. 
A  trivial  example  is  that  one  user  occupies  the  channel  for  the  entire  time  period,  regardless  of  transmitting  useful 
data  or  not,  which  is  an  extreme  of  misbehavior  and  becomes  so  called  “jamming”  attack  [?].  When  this  happens, 
the  entire  network  appeal's  to  be  dysfunctional,  and  even  not  accessible  to  legitimate  nodes.  It  is  interesting  to 
see  that  the  distribution  function  of  a  jammer’s  waiting  time  P (Wj  >  t)  =0  for  all  f  >  1  since  the  jammer  never 
backs  off.  Then,  a  jammer’s  order  gain  G j(t)  =  oo  for  all  t  >  1,  showing  that  the  jammer  has  “infinite  gains” 
over  legitimate  nodes. 

It  is  worthy  of  mention  that  our  results  have  several  limitations:  1)  We  did  not  consider  the  upper  limits 
of  contention  window  and  re-transmissions  for  legitimate  nodes,  such  as  the  7  short-retry  limit  in  the  basic 
access  model  of  802.11  DCF.  Thus,  the  order  gain  is  in  fact  a  theoretical  metric  to  performance  gain  of  backoff 
misbehaving  nodes.  Nevertheless,  we  believe  our  results  are  still  applicable  in  practical  scenarios.  For  instance, 
a  legitimate  node  will  start  a  new  transmission  after  reaches  the  upper  limit  of  re-transmissions,  which  means  its 
chance  to  access  the  channel  becomes  larger.  Thus,  our  results  should  provide  an  upper  bound  on  performance 
gain  of  misbehaving  nodes  for  a  practical  network.  2)  Our  experiments  are  limited  in  a  small-scale  and  single-hop 
network.  Thus,  our  experimental  results  may  not  be  able  to  reveal  the  performance  and  impact  of  misbehaving 
nodes  in  more  complicated  wireless  environments.  3)  We  acknowledge  that  in  the  real  world,  a  misbehaving 
node  may  be  more  sophisticated  than  our  models  defined  in  this  paper.  However,  our  analytical  framework  can 
still  serve  as  a  preliminary  study  to  more  complicated  misbehavior  models. 

Cyber-attacks,  such  as  virus,  not  only  sabotage  the  vulnerable  or  impaired  nodes,  but  also  reproduce  them¬ 
selves  to  propagate  the  damage  epidemically.  On  the  other  hand,  the  damage  to  communication  links,  or  physical 
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in  contrast  to  cyber  attacks,  may  be  easily  detected.  However,  it  is  a  challenging  issue  to  assess  the  impact  of 
such  destruction. 

This  study  provides  another  perspective  to  understand  the  scale  properties  of  attacks.  The  detection  and 
localization  of  failures  provide  the  information  of  where  a  failure  occurs  and  what  its  impact  is  on  the  topological 
composition.  However,  failures  in  the  same  location  may  have  varying  impact  because  of  their  attack  methods. 
We  seek  answers  to  questions  like  1)  what  kind  of  properties  should  an  attack  have  in  order  to  make  a  severe 
damage  to  the  communication  networks?  2)  Are  there  any  asymptotic  bounds  of  an  attack  in  temporal  domain? 
3)  What  is  the  scaling  law  in  large-scale  networks? 

6.2  Tracking  Power  Grid  Vulnerability  under  Data-Centric  Attacks 

Smart  grid  is  a  cyber-physical  system,  which  integrates  communication  networks  into  traditional  power  grid. 
This  integration,  however,  makes  the  power  grid  susceptible  to  cyber  attacks.  One  of  the  most  distinguished 
challenges  in  studying  the  aftermath  of  cyber  attacks  in  smart  grid  is  referred  to  as  data-centric  threats.  In  power 
grids,  these  data-centric  attacks  may  result  in  unstable  power  systems,  and  further  detrimental  impact  of  power 
supplies.  In  this  paper,  we  present  Greenbench,  a  benchmark  that  is  designed  to  evaluate  real-time  power  grid 
dynamics  in  response  to  data-centric  attacks.  The  simulation  results  provide  several  counter-intuitive  suggestions 
to  both  smart  grid  security  research  and  deployment 

6.2.1  Motivation 

As  a  prospective  replacement  to  the  traditional  power  grid,  smart  grid  promises  a  more  reliable,  effective  and 
efficient  power  delivery  and  distribution  by  integrating  advanced  communication  technologies  into  traditional 
power  grid.  This  integration,  however,  brings  a  new  host  of  vulnerabilities  stem  from  Internet  and  opens  the  door 
for  potential  adversaries  to  tear  down  a  physical  system  through  a  cyber  attack. 

Being  aware  of  the  risks,  researchers  begin  to  study  potential  cyber  attacks  and  develop  defense  schemes 
to  protect  this  cyber-physical  system  [114,  115].  However,  a  practical  security  solution  remains  daunting  partly 
because  the  lack  of  a  commonly  recognized  platform  to  evaluate  the  attack/defense  scheme.  Question  arises 
when  we  try  to  classify  various  attacks  so  that  we  could  develop  protection  solutions  in  a  prioritized  way:  How 
do  we  analyze,  simulate,  and  evaluate  the  physical  impact  caused  by  a  cyber  attack  in  smart  grid? 

To  address  this  question,  we  focus  on  the  data-centric  threats  in  smart  grids.  A  data-centric  attack  in  cyber 
system  aims  at  gaining  advantage  by  manipulating  data  exchanged  within  this  system.  Although  vary  in  form, 
the  basic  attributes  of  data-centric  attacks  always  lies  in  one  or  more  of  the  three  categories:  Confidentiality, 
in  which  the  attacker  gains  access  to  data  which  is  not  supposed  to  be  disclosed  to  him;  Integrity,  in  which  the 
attacker  distort  the  content  of  data;  and  Availability,  in  which  the  attacker  block  or  delays  the  data  delivery  to 
legitimate  user.  These  three  attributes  arc  the  basis  of  information  security  and  the  breach  on  any  of  them  may 
cause  disastrous  consequence. 

Even  though  such  attacks  arc  critical  to  the  information  network,  they  will  result  in  much  more  cascading 
impact  than  they  behave  in  cyber  world.  This  is  because  for  an  information-centric  network,  distorted  or  delayed 
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information  undermines  services  and  applications.  But  in  power  grid,  these  data-centric  attacks  may  result  in 
bursty  traffic  of  power  flows,  unstable  power  systems,  and  further  detrimental  impact  of  power  supplies. 

Critical  as  they  are  in  the  cyber  domain,  the  impact  and  destructiveness  of  date-centric  attacks  could  be  am¬ 
plified  significantly  when  being  brought  into  cyber-physical  systems  like  smart  grid.  From  academic  researches 
such  as  the  false  data  injection  attack  [116]  which  points  out  the  design  flaw  of  the  monitoring  system  in  mod¬ 
em  power  grid,  to  practical  attacks  like  the  Stuxnet  [117]  which  destroys  nuclear  power  plant  by  infecting  and 
distorting  control  data,  it  is  obvious  that  data-centric  cyber  attacks  is  real  and  the  demand  for  the  defense  is 
urgent. 

6.2.2  Green  Hub:  A  Micro  Smart  Grid 
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(a)  Green  Hub  Physical  System 


(b)  Green  Hub  Cyber  System 


Figure  55:  Cyber-physical  system. 


Our  objective  is  to  develop  a  cross-domain  simulation  platform  that  can  be  used  to  demonstrate  the  interaction 
and  inter-dependency  of  cyber  attacks  and  power  grid  in  real-time.  As  a  platform,  we  use  Green  Hub  as  the 
underlying  physical  system  for  our  study. 

The  Green  Hub  system  is  a  novel  distribution  level  microgrid  which  has  been  developed  at  the  Future  Re¬ 
newable  Electric  Energy  Delivery  and  Management  (FREEDM)  systems  center  for  the  study  of  power  manage¬ 
ment  strategies  [118].  The  Green  Hub  is  abstracted  from  an  actual  residential  distribution  system,  which  is  a 
230kV/22.86kV  substation  along  with  two  22.86kV  distribution  feeders  in  the  Raleigh  area,  while  the  substation 
voltage  is  reduced  to  69kV/12kV  to  fit  our  study  puipose.  The  Green  Hub  contains  various  innovative  power  de¬ 
vices  developed  in  FREEDM  center,  such  as  the  Solid  State  Transformer  (SST),  and  the  Fault  Isolation  Devices 
(FIDs),  and  it  is  also  connected  to  green  energy  sources  such  as  the  Photovoltaic  (PV)  and  Wind  Turbine  (WT). 
All  those  devices  are  equipped  with  Intelligent  Electronic  Devices  (IEDs),  which  are  ARM-based  embedded  sys¬ 
tems  used  for  real  time  control/monitor  and  communication.  Those  IEDs  interact  with  each  other  to  make  the 
Green  Hub  a  self-autonomous  micro  smart  grid  which  could  either  be  connected  to  main  power  grid  or  operate 
in  an  isolated  mode. 

In  order  to  use  this  power  subsystem  for  our  study,  we  have  to  deal  with  two  issues  as  follows: 
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•  System  abstraction:  An  actual  system  includes  a  large  number  of  various  devices  which  makes  it  im¬ 
proper  for  study  and  simulation,  thus  it  is  necessary  to  simplify  and  abstract  a  high-level  system  with  a 
suitable  size  and  omit  the  minor  details.  The  abstracted  power  system  is  shown  in  Figure  55(a),  which  is 
a  17-bus  power  distribution  system.  Each  bus  is  connected  with  a  SST,  which  is  able  to  implement  bi¬ 
directional  energy  flow  and  DC/AC  transformation.  Each  SST  is  connected  with  a  load  (Load  represents 
AC  load,  PHEV  represents  DC  load),  and  a  renewable  energy  source  (PV,  WT,  or  DESD).  To  ensure  the 
reliability  of  the  system,  four  FIDs  are  deployed  on  different  feeder  segments,  which  will  open  the  circuit 
breaker  and  isolate  failure  from  upper  level  power  grid  in  case  of  a  fault  happens. 

•  Domain  mapping:  The  challenge  here  is  to  map  the  physical  domain  into  cyber  domain  by  replacing  each 
physical  devices  with  its  corresponding  IEDs,  the  mapped  cyber  domain  system  is  shown  in  Figure  55(b). 
Smart  meter  is  used  to  represent  AC  load  as  it  is  the  typical  controller  for  AC  load  such  as  households  or 
buildings.  Also  shown  in  this  figure  is  the  different  network  access  methods  for  various  IEDs  (controllers), 
which  reflect  the  enabling  works  undergoing  in  FREEDM  center.  Specifically,  the  SST,  PHEV,  PV  and  WT 
controllers  are  connected  to  the  communication  network  using  Ethernet,  the  DESD  controller  is  connected 
using  Zigbee,  and  the  smart  meter  uses  wireless  to  access  the  network. 


The  framework  of  Greenbench  with  its  software  implementation  architecture  is  shown  in  Figure  56.  We 
briefly  introduce  the  architecture  and  the  functionality  of  each  block  of  the  framework,  while  leave  detailed 
description  and  design  challenges  to  next  section. 

The  Greenbench  framework  is  functionally  composed  by  two  parts  (simulators),  the  physical  part  (PSCAD) 
and  the  cyber  part  (OMNeT++).  The  physical  and  cyber  domain  model  shown  in  Figure  55  is  built  in  their 
corresponding  part,  and  the  two  parts  interact  through  two  interfaces,  the  interactor,  and  the  buffer  files. 


Figure  56:  Software  implementation  of  Greenbench. 
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6.2.3  Delayed  and  Distorted  Data-Centric  Attacks 


In  the  data-centric  attack,  the  attacker  aims  at  gaining  advantage  or  cause  damage  by  manipulate  the  data  ex¬ 
changed  between  network  entities.  This  data-centric  attack  is  even  more  dangerous  in  smart  grid  because  instead 
of  interrupt  applications  and  services  in  cyber  world,  it  will  disturb  and  damage  the  critical  infrastructure,  and 
potentially  cause  disastrous  loss  which  is  not  confined  only  in  terms  of  economic.  In  order  to  better  understand 
its  impact,  find  effective  solutions  as  well  as  instructive  suggestions,  we  hereby  study  the  data-centric  attack  in 
smart  grid  by  focusing  on  smart  meter  targeted  attacks. 

Smart  meter  in  AMI  is  one  of  the  most  vulnerable  components  in  smart  grid.  For  the  first  reason,  it  is  physi¬ 
cally  accessible  to  public;  For  the  second  ,  it  uses  wireless  communication  which  is  susceptible  to  jamming  attack 
[119]  and  easy  to  be  overheard  [120];  For  the  last  and  most  important,  it  is  usually  overlooked  by  manufactures 
and  was  not  designed  to  resist  any  cyber  attack  [121,  122].  However,  a  system  is  as  strong  as  its  weakest  link,  and 
it  remains  an  open  question  that  whether  the  omitted  security  feature  on  smart  meter  is  reasonable.  To  address 
this  question,  we  select  three  cases  from  different  security  aspect  and  study  them  in  Greenbench,  which  include 
a  delayed  data  attack,  a  distorted  data  attack,  and  a  composite  attack. 

The  metrics  usually  used  to  observe  the  state  of  a  power  system  is  voltage,  current,  real  power  and  reactive 
power.  For  the  simulated  power  system,  the  voltage  on  each  point  will  remain  unchanged  unless  an  overload 
happens,  nut  the  current  keeps  changing  with  variation  of  load;  while  the  trend  for  real  power  and  reactive  power 
change  follow  the  same  pattern  during  our  simulation.  Therefore,  we  use  current  and  real  power  to  illustrate  the 
state  change  of  the  Green  Hub  hereafter. 

For  easy  description,  we  divide  the  Green  Hub  shown  in  Figure  55(a)  into  4  sections:  Section  1  starts  after 
FID1  and  includes  load  1,  5,  6,  7,  8,  and  9;  Section  2  starts  after  load  10  and  includes  load  11,  12,  13,  and  14; 
Section  3  starts  after  FID3  and  includes  load  2,  3,  and  4;  And  section  4  starts  after  FID4  and  includes  load  15, 

16,  and  17.  Note  that  load  10  does  not  belong  to  either  sections. 

Delayed  Price  Information  in  AMI  In  this  case  study  we  simulate  and  analyze  the  “jamming  the  price  signal 
attack”  which  was  proposed  in  [119].  Particularly,  it  is  assumed  that  the  power  consumption  at  consumers  is 
based  on  the  pricing  information,  which  is  a  continuously  changed  variable.  The  pricing  information  is  sent  to 
consumers  (smart  meters)  by  an  aggregator  via  wireless  link  and  the  attacker  is  able  to  jam  the  pricing  signal 
within  a  certain  area.  During  the  jamming,  the  consumers  will  remain  the  power  consumption  amount  because 
they  do  not  have  the  up-to-date  pricing  information.  When  there  is  a  significant  change  of  the  pricing  information, 
the  attacker  stops  jamming.  The  sudden  change  of  the  pricing  information  will  cause  a  significant  change  on 
power  consumption  in  a  short  time,  and  consequently  affects  the  power  grid  stability. 

In  this  case  we  assume  that  the  attacker  compromised  the  load  controller  (smart  meter)  11,  12,  13,  and  15,  16, 

17,  which  locate  within  a  nearby  area  geographically.  We  also  assume  the  extreme  case  that  during  the  jamming 
attack,  consumers  simply  do  not  consume  any  power,  and  then  operate  under  full  load  when  the  jamming  stops 
and  updated  pricing  signal  is  received.  As  a  comparison,  we  also  analyze  this  scenario  and  simulate  it  in  single 
domain  using  PSCAD. 

When  being  considered  in  cyber-physical  cross  domain,  however,  the  single  domain  scenario  setup  is  over- 
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Figure  57:  Jamming  the  Price  Signal  attack. 

idealistic.  In  practice  the  smart  meters  won’t  be  able  to  communicate  with  the  wireless  aggregator  exactly  at  the 
same  time,  because  wireless  channel  can  only  be  used  by  one  host  at  any  time.  A  more  realistic  simulation  is 
deployed  in  the  Greenbench,  and  the  simulation  setup  is  shown  in  Figure  57.  Wireless  aggregator  1  (WA1)  is  the 
access  point  for  load  15,  16,  and  17,  while  wireless  aggregator  2  (WA2)  is  the  access  point  for  load  11,  12,  and 
13.  There  is  no  interference  between  WA1  and  WA2  area,  but  hosts  within  each  area  will  contend  to  access  the 
wireless  channel.  And  the  physical  load  is  assumed  to  be  connected  to  the  power  system  immediately  when  its 
load  controller  gains  the  access  to  its  WA  and  its  connection  request  is  received  by  the  control  center. 

Because  of  wireless  channel  contention,  the  connection  requests  from  those  6  loads  do  not  arrive  at  control 
center  at  the  same  time,  and  hence  the  physical  loads  also  take  turns  to  be  connected  to  main  power  grid.  Although 
the  time  between  each  load  get  connected  is  very  short,  it  is  enough  for  the  power  grid  to  be  prepared  for 
the  load  change,  and  therefore  the  current  and  real  power  change  is  much  more  smooth  than  that  in  wireline 
communications,  which  indicates  the  system  stability  is  unlikely  to  be  impacted. 

Remark  18.  In  this  case,  the  attack  causes  a  real  load  change,  and  the  attacker’s  goal  is  to  cause  an  instability 
to  power  system  by  the  sudden  load  change.  A  similar  attack  named  “distributed  internet-based  load  altering 
attack”  [123]  also  follows  this  type,  in  which  the  attacker  is  assumed  gained  the  control  of  smart  meters  over  a 
large  area,  and  by  turning  off  a  large  amount  of  household  load,  e.g.,  water  heater  in  1000  homes,  the  power 
grid  stability  is  negatively  impacted.  However,  as  shown  by  Greenbench  simulation,  this  type  of  attack  actually 
bears  low  risk  mainly  because  the  contention  period  of  wireless  communication  acts  as  a  buffer  which  mitigates 
the  “sudden”  change  so  that  the  power  grid  has  enough  time  to  prepare  for  the  load  change. 
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Figure  58:  Load  Redistribution  attack. 
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Figure  59:  Load  redistribution  attack  simulation  in  Greenbench. 
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Distorted  Load  Attacks  In  this  case  the  Load  Redistribution  (LR)  attack  [124]  is  simulated.  The  LR  attack  is 
a  special  type  of  the  false  data  injection  attack  [116].  The  false  data  injection  attack  refers  to  an  attack  in  which 
carefully  designed  false  data  could  be  added  to  certain  group  of  monitored  data  in  power  grid,  however,  those 
false  data  can  not  be  detected  by  the  state  estimation  algorithm  which  is  used  to  detect  bad  data  in  power  grid. 
This  false  data  is  accepted  and  used  by  the  control  center  to  make  decision,  although  it  is  not  in  consistent  with 
real  device  status,  and  this  inconsistency  may  cause  unpredictable  damage  to  power  grid. 

In  the  LR  attack,  the  author  put  some  constrains  on  the  attackable  nodes  in  smart  grid,  which  makes  LR 
attack  more  practical  and  easier  to  be  launched.  Particularly,  while  in  original  false  data  injection  attack  the 
author  treat  each  node  homogeneously,  in  LR  attack  it  is  assumed  that  only  the  load  nodes  arc  attackable.  Note 
in  this  attack,  the  attacker’s  goal  is  not  to  change  the  real  load  -  the  power  consumed  by  a  device,  but  to  modify 
the  load  reading,  which  is  the  monitored  value  sent  to  the  control  center.  And  we  use  real  load  and  load  reading 
to  different  the  two  concepts  hereafter. 

Same  as  in  case  I,  we  also  assume  the  attacker  compromised  meters  which  provide  readings  for  load  11,  12, 
13,  15,  16,  and  17.  Two  special  constrains  of  the  LR  attack  are  that  the  overall  real  load  consumption  of  the 
attacked  area  remains  the  same,  while  the  load  reading  changes  for  each  specific  load  does  not  exceed  50%  of  its 
original  load.  According  to  these  constrains,  we  setup  the  attack  scenario  as  following: 

1.  Assume  the  attacker  increases  the  load  reading  at  load  15,  16,  and  17;  and  decreases  it  at  load  11,  12,  and 
13.  The  total  increased  load  at  load  15,  16,  17  and  total  decreased  load  at  load  11,  12,  13  sum  to  zero. 

2.  The  attack  is  launched  in  3  time  steps  with  0. 1  second  time-interval  between  each  step.  For  each  step,  at 
load  15,  16,  and  17,  the  attacker  increases  their  load  reading  by  15%  of  their  original  load,  and  at  the  same 
time  he  decreases  the  same  amount  of  load  reading  at  load  11,  12,  and  13.  The  total  load  reading  change 
for  each  load  is  45%  of  its  original  load. 

3.  Note  that  in  this  simulation,  our  goal  is  different  from  [124].  In  [124],  the  goal  of  the  attack  is  to  find 
a  combination  of  load  redistribution  which  causes  the  maximum  cost,  while  our  goal  is  to  deploy  this 
attack  in  a  real  cyber-physical  system  and  study  its  potential  physical  impacts  rather  than  its  economic 
cost.  Therefore  it  is  unnecessary  to  solve  the  optimization  problem  used  in  [124]. 

The  simulation  setup  is  shown  in  Figure  58.  In  Figure  58,  the  Meter_10_15  and  Meter_10_ll  are  meters 
which  monitor  the  current  and  power  flow  on  the  feeder  segment  between  load  10  -  load  15,  and  load  10  -  load 
11,  and  their  sample  frequency  are  set  to  be  10  samples  (messages)  per  second.  The  maximum  threshold  on 
feeders  in  both  section  2  and  section  4  is  set  to  be  250A.  And  in  this  simulation  we  collect  only  the  meter  reading 
from  Meter_10_15  and  Meter_10_ll  as  it  is  intuitive  that  the  feeders  in  segment  10-11  and  10  -  15  hold  the 
maximum  current  and  real  power  value  in  their  own  branches,  and  thus  they  arc  the  first  ones  to  fail  if  there  is  an 
over-current  on  these  branches.  The  Breaker_4  represents  the  circuit  breaker  controller  of  FID  4. 

The  simulation  result  is  shown  in  Figure  59,  and  the  attack  steps  are  described  as  below: 

1.  t=0.5s:  Attacker  launches  attack.  Both  branches  operate  normally  and  the  current  remains  at  210A. 
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2.  t=0.5s-0.7s:  Load  reading  in  section  4  increases  with  15%  per  0.1  sec,  while  load  reading  in  section  2 
decreases  with  the  same  pace. 

3.  t=0.7s:  Current  at  section  4  exceeds  threshold  by  reaching  253A,  and  over-current  message  is  sent  to 
control  center.  Control  center  sends  trip  message  to  breaker  4,  and  section  4  loses  power. 

On  the  other  hand,  as  shown  in  Figure  59(b),  because  the  monitored  load  decreases  in  section  2,  less  power 
is  dispatched  to  this  branch,  and  consequently  the  current  is  much  lower  than  it  should  be,  which  will  also  cause 
abnormal  behavior  of  power  devices  in  this  section. 

Remark  19.  In  this  case,  the  attack  does  not  change  any  real  load  consumption,  on  the  contrary,  it  modifies 
the  messages  sent  by  meters  and  aims  at  confuse  the  control  center  of  monitored  load  consumption  and  real 
load  consumption.  As  shown  by  the  result,  this  type  of  attack  is  more  dangerous.  Because  the  control  center  is 
bewildered  of  the  real  state,  it  makes  an  incorrect  decision,  which  is  more  harmful  than  merely  a  sudden  load 
change. 

Remark  20.  Greenbench  simulation  of  the  two  cases  suggests  a  smart  grid  security  solution  which  is  instructive 
for  smart  grid  security  research:  relatively,  a  single  or  a  set  of  smart  meters  being  compromised  and  gained 
control  does  not  put  smart  grid  under  a  high  risk;  as  long  as  the  attacker  is  unable  to  forge  an  authentic  message, 
the  whole  smart  grid  is  safe.  Therefore,  compared  to  fortify  smart  meter  and  keep  it  from  being  compromised,  we 
should  pay  more  attention  on  designing  security  policies  to  authenticate  messages  and  detect  a  bad  or  inconsis¬ 
tent  message  even  if  a  meter  is  compromised. 


Figure  60:  LR  attack  and  Man-in-the-middle  attack. 

Composite  Attacks:  Distorted  Data  and  Man-in-the-Middle  Attack  The  power  grid  is  a  critical  infrastruc¬ 
ture  and  is  state-owned  in  many  countries,  thus  those  who  sabotage  power  grid  assumes  serious  crime.  It  is 
reasonable  to  assume  the  power  grid  targeted  attack  is  made  by  clear  puipose  and  therefore  the  attacker  will 
explore  every  possibility  to  maximize  the  damage.  Rather  than  a  single  attack,  the  attacker  is  highly  likely  to 
launch  multiple  attacks  which  affects  more  devices. 
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(a)  Current  and  power  flow  through  Meter_trans_10  (b)  Current  and  power  flow  through  Meter_10_l  1 


»IM 


is  redirected  happens  @  1.3  sec 


(c)  Current  and  power  flow  through  Meter_10_15 


(d)  Current  and  power  flow  through  Meter  of  section  3 


Figure  6 1 :  Attack  combination  simulation  in  Greenbench. 


123 


In  this  case  we  assume  a  skilled  attacker  combines  more  than  one  attacks  and  tries  to  cause  a  more  severe 
impact  to  the  smart  grid.  Specifically,  we  assume  that  at  the  same  time  an  LR  attack  is  launched,  the  attacker  also 
compromises  a  router  and  applies  a  Man-in-the-middle  attack  in  which  he  eavesdrops  messages  processed  by  the 
router,  locates  the  “trip”  message  sent  from  control  center  to  breaker  4,  and  modifies  the  destination  address  of 
the  “trip”  message  from  breaker  4  to  breaker  3.  This  scenario  is  shown  as  in  Figure  60. 

Figure  6 1  shows  the  current  and  real  power  values  at  different  devices/points  in  the  Green  Hub.  For  example, 
we  use  “trans_10”  denote  the  feeder  segment  between  substation  transformer  and  load  10.  Given  that  an  LR 
attack  begins  at  0.5  second,  we  found  that  at  0.7  second,  the  monitored  current  at  Meter_10_15  exceeded  250A, 
and  the  control  center  sent  the  “trip”  message  to  breaker  4.  Here  we  examine  the  subsequent  events  as  follows: 

1 .  Because  the  attacker  also  compromised  the  router,  the  “trip”  message  sent  by  the  control  center  was  modi¬ 
fied,  and  the  message  destination  was  changed  to  breaker  3. 

2.  As  a  direct  result  of  the  redirected  message,  breaker  3  trips  and  causes  a  blackout  of  the  whole  section  3, 
as  shown  in  Figure  61(d). 

3.  Also,  since  breaker  4  did  not  receive  the  “trip”  message  from  the  control  center,  the  circuit  breaker  remains 
closed,  which  makes  the  feeder  in  section  4  run  under  an  over-current  state. 

4.  At  time  t  =  1.3  second,  which  is  0.5  seconds  after  section  3  running  under  abnormal  condition,  the  extra 
heat  caused  by  the  over-current  causes  the  feeder  to  melt  and  a  feeder-to-ground  short  circuit  fault  happens. 

5.  Consequently,  we  can  see  the  disastrous  impact  to  the  whole  power  grid,  as  shown  in  Figure  61(a),  Figure 
61(b)  (current  and  power  flow  in  Section  2,  about  4  times  of  their  normal  values),  and  Figure  61(c)  (current 
and  power  flow  in  Section  4). 

6.  It  is  observed  that  the  power  values  on  both  branches  suddenly  dropped  to  negative,  which  indicates  a 
reverse  current  flow. 

7.  Then  a  more  severer  damage  is  triggered  on  the  feeder  segment  from  substation  transformer  to  load  10, 
which  is  shown  in  Figure  61(a).  The  current  on  this  feeder  surged  from  around  450A  to  16,600A,  more 
than  30  times  of  its  normal  operate  value. 

Such  a  huge  serge  will  surely  cause  severe  damage  to  the  connected  power  devices.  Then  the  transformers  and 
even  the  substation  are  also  very  likely  to  be  damaged,  which  may  serve  as  an  initial  point  of  a  larger-area 
cascading  failure. 

Remark  21.  As  indicated  in  the  simulation  result,  the  composite  attacks  cause  much  severer  impact  than  any  of 
the  single  attack.  This  result  indicate  another  non-intuitive  solution:  although  an  attack  on  any  single  device  is 
unavoidable,  its  impact  could  be  limited  by  making  it  difficult  for  the  attacker  to  combine  various  attacks.  The 
most  intuitive  solution  ( yet  always  being  neglected  in  practice)  is  to  use  different  login/password  for  different 
devices.  For  more  sophisticated  solutions,  one  could  deploy  a  hierarchical  security  policy,  in  which  different  lev¬ 
els  of  devices  are  protected  by  different  security  methods  (physically  locked  and  deploying  surveillance  camera, 
using  encryption  algorithms  such  as  AES,  etc). 
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6.2.4  Summary 


We  developed  Greenbench,  a  cross-domain  simulation  platform  which  could  capture  the  impact  of  cyber  attacks 
in  power  systems.  Along  with  Greenbench,  We  study  the  data-centric  attacks  which  target  at  damaging  power  grid 
by  manipulating  the  data  exchanged  between  devices.  The  simulation  results  convey  non-intuitive  indications 
and  instructive  suggestions  to  both  smart  grid  security  research  and  deployment.  Nonetheless,  Greenbench’ s 
capability  is  not  confined  to  this,  its  flexibility  and  extensibility  allows  us  to  analysis  and  evaluate  various  smart 
grid  attacks,  which  is  one  of  our  future  works.  As  another  future  work,  we  will  integrate  power  grid  dedicated 
communication  protocols  such  as  DNP3  and  IEC-61850,  and  evaluate  attacks  targeting  those  protocols. 

6.3  Relevance  to  Original  Goals 

To  facilitate  classification  of  networks  and  identify  anomalous  networks,  we  introduced  a  generalized  Markov 
Graph  model  and  its  application  to  social  network  classification  and  social  network  synthesis.  The  main  result 
of  the  generalized  Markov  Graph  model  is  that  the  degree  distribution,  the  clustering  coefficient  distribution,  and 
the  crowding  coefficients  arc  three  fundamental  statistics  for  characterizing  generic  social  networks.  In  addition, 
the  generalized  Markov  Graph  model  provides  a  new  insight  into  clustering  coefficient:  it  is  the  result  of  the 
dependence  between  higher  order  structures,  namely  the  triads,  in  social  networks. 

To  detect  and  track  systematic  failures  in  networks.  We  have  presented  a  distributed  algorithm  to  track  a 
systematic  failure  in  sensor  networks  which  is  robust  to  random  failures.  The  algorithm  presented  has  a  low 
communication  complexity,  and  requires  a  small  number  of  local  communications  which  makes  it  ideal  for  real 
time  applications.  We  assume  neither  any  global  coordinate  information  nor  any  capability  of  directly  sensing  the 
phenomenon  causing  the  failure.  We  presented  precise  mathematical  formulations  of  the  algorithms  along  with 
proofs  for  their  correctness.  The  simulations  performed  demonstrate  the  accuracy  and  robustness  of  our  methods. 

Our  proposed  framework,  Greenbench,  makes  new  contributions  in  two-fold.  First  of  all,  Greenbench  is  a 
cross-domain  simulation  platform  that  includes  an  underlying  power  system  overlayed  by  a  communication  sys¬ 
tem.  In  such  a  way,  we  are  able  to  capture  the  impact  of  cyber  attacks  in  power  systems  in  real-time,  unlike 
networking  simulations  as  [125]  or  physical  systems  like  [126,  127,  128].  Second,  we  aim  to  study  the  conse¬ 
quences  of  data-centric  attacks  rather  than  manipulation  of  communication  protocols  like  [129].  The  benefits  of 
our  efforts  is  that  there  might  be  many  attacks  or  manipulation  schemes  to  attack  smart  grid,  however,  the  result¬ 
ing  of  attacks  and  countermeasures  for  data  integrity  will  be  revealed  in  ultimate  data  received,  which  could  be 
delayed  and/or  distorted.  To  this  end,  our  study  is  able  to  demonstrate  the  direct  impact  of  security  attacks  at  the 
power  system  level,  either  due  to  compromised  smart  meters  in  AMI  or  DoS  attacks  to  messages  in  transmission. 


7  Broad  Impact,  Workforce  Training,  and  Dissemination 

WMD  can  only  have  an  impact  because  it  is  a  human  initiative.  This,  in  turn,  is  of  course  a  result  of  some  issue 
a  group  of  individuals  have  taken  up  as  a  a  rallying  point  to  operate  for  spreading  fear  and  terror,  and  destruction 
of  infrastructure.  Studying  the  underlying  social  stratum  which  may  lead  to  such  abject  plotting  and  planning  or 
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even  discussing  a  WMD  scenario  hence  makes  sense  as  a  holistic  approach.  In  particular,  in  this  work,  we  have 
explored  fundamental  and  robust  models  of  social  networks  which  can  allow  us  to  detect  specific  structures.  As 
we  will  elaborate  below,  we  have  proposed  a  new  model  beyond  the  ’’Preferential  Attachment”,  so  that  we  allow 
a  very  robust  comparison  and  classification  of  social  networks,  as  a  first  step  of  detecting  suspicious  activity 
planning.  We  have  also  developed  a  mathematical  framework  which  can  model  a  ’’propagation  of  belief’  in  a 
network,  and  a  capacity  to  influence  this  propagation,  and  the  trend  of  the  belief. 

7.1  Contribution  to  the  Body  of  Knowledge 

Our  work  focused  on  the  impact,  detection,  and  tracking  of  WMDs  on  networks.  The  modern  infrastructure  relies 
heavily  on  many  types  of  networks,  and  their  failures  may  be  heavily  consequential.  In  the  course  of  our  work 
over  the  last  five  years,  our  formulation  of  effective  strategies  for  timely  detection  and  tracking  of  such  attacks  on 
networks  has  led  to  several  journal  and  conference  publications  listed  in  the  references,  as  well  as  several  invited 
talks,  including.  Imperial  College,  University  of  Luxemburg,  Univ.  of  Illinois-UC,  University  of  Minnesota  and 
the  Institute  of  Mathematics  and  its  Applications,  University  of  Canberra,  Australia,  and  Chalmers  University  in 
Sweden.  Dr.  Krim  has  also  been  voted  on  the  editorial  Board  of  the  Flagship  of  the  Signal  Processing  Magazine, 
where  he  was  invited  to  write  a  feature  tutorial  article  on  the  mathematical  and  algorithmic  tools  developed  in  the 
course  of  this  DTRA-funded  research.  Dr.  Krim  has  in  addition,  thanks  to  the  partial  support  of  DTRA,  submitted 
the  first  draft  on  an  upcoming  book  on  “Geometric  Methods  in  Signal  and  Image  Analysis  with  Cambridge  Press”. 

7.2  Personnel  Support 

Two  faculty  members,  two  post-doc,  and  a  total  of  five  PhD  students  at  North  Carolina  State  University  arc 
supported  during  this  study: 

•  Dr.  Wenye  Wang,  Associate  Professor  in  the  Department  of  Electrical  and  Computer  Engineering. 

•  Dr.  Hamid  Krim,  Professor  in  the  Department  of  Electrical  and  Computer  Engineering. 

•  Dr.  Fei  Xing,  a  former  Ph.D.  student  in  the  Department  of  Electrical  and  Computer  Engineering.  Graduated 
in  January  2010. 

•  Ming  Zhao,  Ph.D.,  under  the  supervision  of  Dr.  Wenye  Wang,  December  2009 

•  Dr.  Yi  Xu,  Research  associate  in  the  Department  of  Electrical  and  Computer  Engineering.  Graduated  in 
May  2010  under  the  supervision  of  Dr.  Wenye  Wang. 

•  Dr.  Lei  Sun,  a  former  Ph.D.  student  in  the  Department  of  Electrical  and  Computer  Engineering  under  the 
supervision  of  Dr.  Wenye  Wang. 

•  Dr.  Harish  Chintakunta,  a  former  student  in  the  Department  of  Electrical  and  Computer  Engineering  under 
the  supervision  of  Dr.  Hamid  Krim. 
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•  Dr.  Tian  Wang,  a  former  student  in  the  Department  of  Electrical  Engineering  and  graduated  in  Theoretical 
Physics  under  the  supervision  of  Dr.  Hamid  Krim 

•  Dr.  Jennifer  Gamble,  partially  supported  in  the  Department  of  Electrical  Engineering  and  graduated  under 
the  supervision  of  Dr.  Hamid  krim 

•  Zhuo  Lu,  Ph.D.  student  in  the  Department  of  Electrical  and  Computer  Engineering  under  the  supervision 
of  Dr.  Wenye  Wang. 

7.3  Social  Dimension  of  WMD  Attacks 

7.3.1  Background  and  Rationale 

As  an  abstract  model  of  a  social  environment,  a  social  network  includes  a  set  of  nodes,  which  could  be  a  set  of 
individuals  or  a  set  of  groups  of  individuals,  and  a  set  of  relationships  among  these  nodes.  The  analysis  of  social 
networks  is  important  in  several  aspects.  For  example,  it  reflects  characteristics  of  a  social  environment,  so  that 
we  can  recognize  important  or  interesting  nodes,  groups  of  nodes  or  even  societies.  It  can  also  help  us  understand 
how  a  social  environment  evolves  so  that  we  arc  able  to  make  predictions  about  its  impact.  Moreover,  we  might 
even  learn  how  to  control  a  social  environment  based  on  the  prediction  and  recognition  techniques  to  benefit  the 
people  in  the  environment. 

Social  networks  have  been  studied  for  decades,  and  several  simple  but  fundamental  models  have  been  pro¬ 
posed.  For  social  network  synthesis,  the  Barabasi-Albert  model  [130]  successfully  generates  social  networks 
which  follow  a  power  law  degree  distribution  based  on  the  assumption  of  preferential  attachment.  In  statistics, 
the  Markov  Graph  model[131],  the  p*  model  [132]  as  well  as  the  models  from  Statistical  Mechanics  [133]  arc 
well  known  for  characterizing  the  probability  distribution  of  networks.  Other  social  network  models  were  also 
proposed  in  engineering  [134,  135].  More  recently,  we  proposed  a  method  based  on  a  Markov  Graph  model  to 
address  the  problem  of  social  network  classification  [136]. 

The  characteristics  of  nodes,  to  help  identify  key  nodes  in  a  social  network,  are  of  particular  interest  to  re¬ 
searchers.  The  classification  of  social  networks,  however,  has  hardly  been  addressed  despite  its  great  importance 
in  applications.  For  example,  in  criminal  networks,  the  potential  use  of  social  network  classification  would  be 
to  detect  whether  different  criminal  networks  belong  to  a  larger  network,  hence  using  similar  tactics.  In  terrorist 
networks,  this  method  could,  for  instance,  be  used  to  detect  whether  a  terrorist  network  is  led  by  the  same  leader 
who  has  a  history  of  organizing  other  known  terrorist  plots,  and  hence  to  help  identify  the  leader. 

7.3.2  Our  Contributions 

We  were  the  first  to  propose  a  method  based  on  a  Markov  Graph  model  to  classify  different  types  of  social 
networks  [136].  The  classification  method  makes  use  of  two  features,  the  degree  list  and  the  number  of  triads, 
to  determine  the  probability  distribution  of  networks  in  a  Markov  Graph  model  and  to  use  them  as  the  crucial 
features  for  classification  of  social  networks.  As  we  demonstrate  later,  this  simple  model  is  insufficient  to  provide 
the  classification  performance  improvement  which  we  seek. 
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In  the  research  area  of  social  network  synthesis,  Barabasi-Albert  model  as  well  as  many  other  related  models 
enable  the  synthesis  of  the  evolution  of  simple  social  networks  whose  degree  distributions  obey  the  power  law. 
The  algorithms  for  these  models  are  clear  and  efficient.  The  open  question,  however,  and  in  light  of  these 
models’  shortfall  of  fully  capturing  the  probabilistic  structure  of  a  social  network,  is  about  the  statistical  behavior 
of  the  required  additional  features,  namely  the  clustering  and  crowding  coefficients,  where  the  former  reveals 
the  information  between  a  node’s  neighbours,  and  the  latter  describes  the  neighboring  environment  of  a  triad  of 
people  in  a  social  network. 

To  this  end,  we  proposed  a  model  which  provides  answers  to  both  of  these  questions.  In  some  sense,  the 
preferential  attachment  algorithm  by  Barabasi-Albert  is  closely  related  to  the  simple  Markov  Graph  model.  In 
the  preferential  attachment  algorithm,  the  probability  of  a  new  edge  attaching  to  a  node  depends  on  the  number  of 
edges  attached  to  the  node,  which  is  the  degree  of  the  node.  And  in  a  Markov  Graph  model,  the  basic  assumption 
is  that  the  probability  of  a  new  edge  being  formed,  depends  on  a  function  whose  variables  are  the  states  of  the 
relationships  between  the  nodes  in  this  new  edge  and  all  of  the  other  nodes. 

Towards  addressing  all  these  questions,  we  proposed  a  so  called  generalized  Markov  Graph  model  [137]. 
The  characteristics  of  this  new  model  include  the  dependences  on  the  relationships  between  pairs  of  nodes  in  the 
network,  as  well  as  the  relationships  between  triplets  of  nodes.  On  this  basis,  we  also  build  a  new  algorithm  for 
social  network  synthesis.  The  main  result  of  the  model  is  shown  next : 

Theorem  22.  Any  simple  network  graph  G  with  an  associated  dependence  graph  Dqm  has  a  probability  mass 
function: 


P(G)  =z  1exp( sn(G)On  +  ^2  Ci(G)yfi 

n  1  (72) 

+ y^j(GOrj)» 

j 

where  z  is  a  normalization  factor,  sn(G)  stands  for  the  number  ofnth  k-stars  in  network  G,  6n  is  the  associated 
coefficient,  cfG)  is  the  number  of  ith  cluster-stars,  7 j  is  the  associated  coefficient,  tj(G)  is  the  number  of  jth 
tri-stars  and  tj  is  the  associated  coefficient.  Subscripts  n,  i  and  j  count  from  the  first  to  the  last  structures  they 
correspond  to. 

In  light  of  this  new  model,  we  have  discovered  that  the  probability  distribution  of  generic  social  networks  will 
depend  on  not  only  the  degree  list,  but  also  on  the  clustering  coefficient  list  as  well  as  the  crowding  coefficient  list. 
These  features  can  subsequently  be  applied  to  the  classification  of  social  networks,  as  well  as  to  the  validation  of 
the  generalized  preferential  attachment  algorithm. 

To  further  validate  the  models  and  reveal  more  potential  applications  of  the  models,  we  test  the  probability 
density  function  of  the  elements  in  the  adjacency  matrix  as  a  direct  test  of  the  basic  assumptions  of  the  model. 
Such  a  result  is  then  directly  applied  to  a  belief  control  mechanism  in  information  flow  models  [138,  139]. 
We  first  propose  an  information  flow  model  (IFM)  of  belief  that  captures  how  interactions  among  members 
affect  the  diffusion  and  eventual  convergence  of  a  belief.  The  IFM  model  reveals  that  the  diffusion  of  beliefs 
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heavily  depends  on  two  characteristics  of  the  social  network  structure,  namely  degree  centralities  and  clustering 
coefficients.  We  apply  IFM  to  both  analyze  and  control  the  convergence  of  a  belief.  We  capture  the  structure  of 
the  social  network  using  two  different  techniques,  namely  the  preferential  attachment  and  the  generalized  Markov 
Graph  model.  We  evaluate  our  models  via  experiments  with  published  real  social  network  data.  The  flow  chart 
of  the  IMF  model  is  showed  as  below. 


Figure  62:  Flow  Chart  for  Information  Flow  Model. 

With  the  real  network  data,  the  experimental  results  show  that  the  generalized  Markov  Graph  model  correctly 
generated  the  converged  beliefs  in  information  flow  models.  In  addition,  the  control  strategy  of  beliefs  derived 
from  a  generalized  Maikov  Graph  model  outperformed  the  state-of-art  social  network  models. 

We  also  explored  the  relationship  between  a  variation  on  a  generalized  Markov  Graph  model  (GMGV)  and 
high  dimensional  Laplace  operators,  which  arc  also  called  the  edge  Laplacians.  The  reason  these  two  concepts 
are  related  is  that  they  both  arc  based  on  the  idea  that  networks  arc  partitioned  into  different  units,  i.e.  simplices, 
and  the  dependence  graph  of  the  GMGV  is  very  closely  related  with  the  Laplacian  operator.  In  addition,  we  have 
observed  that  the  eigen-space  of  a  Laplacian  operator  has  very  unique  properties,  including  integer  eigenvalues 
and  corresponding  sparse  eigenvectors.  The  GMGV  could  give  a  reasonable  explanation  to  such  phenomena 
because  cliques  in  its  dependence  graph  do  correspond  to  the  non-zero  elements  in  the  sparse  eigenvectors  with 
integer  eigenvalues.  This  is  also  a  novel  way  of  identifying  high  dense  subgroups  with  symmetry  in  a  social 
network. 

7.3.3  Summary 

In  summary,  we  introduced  a  generalized  Markov  Graph  model.  The  main  result  of  the  model  is  that  the  degree 
distribution,  the  clustering  coefficient  distribution,  and  the  crowding  coefficients  arc  shown  to  be  three  funda¬ 
mental  statistics  for  characterizing  generic  social  networks.  In  order  to  validate  the  assumptions  of  the  model, 
experiments  on  the  probability  distribution  of  all  adjacency  matrix  elements  arc  performed.  In  addition,  classifica¬ 
tion  experiments  arc  carried  out  to  show  that  the  three  resulting  statistics  in  the  model  do  effectively  characterize 
social  networks.  To  further  verify  the  model  as  well  as  show  its  potential  application,  we  apply  the  generalized 
Markov  Graph  model  in  the  information  flow  problems.  All  experimental  results  and  applications  show  that  the 
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generalized  Markov  Graph  model  outperformed  the  state-of-art  models.  To  further  explore  the  model  and  give 
a  direction  in  the  future  research,  we  investigated  the  relationship  between  a  variation  on  a  generalized  Markov 
Graph  model  and  the  eigen-space  of  Laplacians  of  a  network. 

7.4  Potential  Applications 

The  new  works  arc  proposed  to  enhance  and  extend  our  earlier  efforts  in  the  past  three  years.  In  the  scope  of 
the  original  project,  we  focused  on  modeling  node  behaviors,  such  as  being  cooperative,  faulty,  destructive,  and 
dead,  and  analyzing  network  responses  to  a  multitude  of  node  failures,  in  terms  of  network  survivability  and 
connectivity,  and  design  of  protocols  and  algorithms  for  network  robustness.  While,  in  this  extended  period,  we 
arc  still  pursuing  the  fundamental  study  of  network  responses  to  WMD  attacks,  we,  however,  aim  to  explore 
the  impact  of  failures  in  temporal  and  spatial  domains,  e.g.,  when  a  failure  occurs,  how  far-away  nodes  will  be 
affected,  and  how  long  it  takes  to  notice  such  a  failure,  instead  of  just  probabilistic  estimation  and  connectivity 
analysis.  The  proposed  works  are  clearly  more  challenging,  given  the  limited  existing  theoretical  knowledge  of 
these  issues.  As  a  result,  our  efforts  will  advance  the  knowledge  and  fundamental  understanding  of  the  of  WMD 
attacks,  and  the  resulting  failures  in  the  infrastructure  networks  which  form  the  backbone  of  civilian  and  military 
capabilities.  Further,  our  results,  we  believe,  will  provide  significant  insights  on  the  scope  of  damage  due  to 
cascading  and  correlated  failures,  including  catastrophic  loss  of  connectivity,  unsuccessful  missions,  slow-down 
of  the  Internet,  and  damage  to  the  economy  and  society  at  large.  All  together,  the  proposed  research  will  leverage 
DoD  capability  in  response  to  attacks  from  WMD/WME. 

Failure  Prevention  via  Inter-Cooperation:  For  a  large-scale  network,  a  more  and  more  promising  solution  is  to  en¬ 
able  inter-operation  of  communication  systems  using  different  access  technologies,  including  wired  and  wireless, 
on  licensed  and  unlicensed  frequency  bands.  The  objectives  can  be  two-fold  in  general:  improving  bandwidth 
utilization  and  improving  communication  capacity.  While  the  former  one  is  more  for  offering  better  services  to 
civilian  world,  the  latter  one  is  more  for  offering  critical  communications  in  emergency,  disaster  rescure,  and 
even  opportunistic  communications.  Therefore,  our  results  provide  insights  on  two  fronts.  First,  we  have  found 
that  the  latency  for  failure  in  communication  networks  may  propagate  with  an  upper  bound,  depending  on  the 
scales  of  the  network,  while  there  exists  a  lower  bound  for  the  information  to  be  delivered  to  other  nodes.  Second, 
our  results  suggest  useful  parameters,  such  as  node  density,  mobility  range,  and  transmission  power  etc  toward 
design  objectives,  respectively. 

Failure  Containment:  Our  results  on  the  spreading  of  correlated  failures  provide  the  conditions  to  contain  failures. 
We  can  hence  sample  and  test  a  subset  of  nodes  in  the  network  infrastructure  to  evaluate  the  network  robustness 
through  a  detailed  examination  on  the  failure  correlations  among  these  nodes.  If  the  failure  correlations  arc  in 
the  percolation  regime,  we  are  alarmed  to  take  actions  to  reduce  the  failure  dependence  in  the  network  and  hence 
improve  the  network  resilience.  Our  findings  arc  also  useful  at  the  network  planning  and  construction  phases  by 
providing  the  failure  resistance  guidelines. 

Radio  Resource  Recycling:  Our  analysis  of  the  performance  limits  in  cognitive  radio  networks  demonstrates  the 
feasibility  of  information  delivery  through  recycling  the  temporarily  unused  radio  resources  in  the  recovery  net¬ 
work.  Since  the  recovery  network  is  limited  in  the  availability  of  radio  resources,  the  cognitive  radio  technology 
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is  important  to  supplement  the  scheduling  schemes  to  make  the  full  network  utilization.  Our  results  further  pro¬ 
vide  the  essential  information  in  dimensioning  the  cognitive  radio  networks  that  achieve  the  information  delivery 
goals.  Additionally,  our  findings  provide  the  theoretical  foundation  for  the  cognitive  radio  network  deployment 
in  multiple  phases  and  predict  the  network  performance  in  accordance  to  each  deployment  phase. 

Network  Synthesis  and  Failure  Detection:  Significant  damage  to  network  infrastructures  will  cause  devastating 
consequences  and  may  cascade  onto  many  other  systems.  Crucial  information  distributed  in  a  surveillance  net¬ 
work  may  be  lost.  In  our  work,  we  develop  a  distributed  and  localized  algorithm  to  accurately  detect  and  track 
systematic  failures  in  sensor  networks  which  may  indicate  the  deployment  of  WMD  in  the  region  the  network  is 
deployed.  By  distributed,  we  mean,  we  do  not  gather  the  data  at  any  central  processing  location,  and  by  localized, 
we  mean  that  only  the  information  of  local  neighborhood  is  sufficient  for  a  node  to  determine  if  it  is  on  the  front 
of  a  propagating  systematic  failure.  Over  the  past  year,  we  have  made  significant  progress  in  precisely  charac¬ 
terizing  networks  in  general  probabilistically  and  have  run  extensive  experimentation.  A  journal  paper  is  under 
review  and  a  conference  paper  has  been  published.  We  have  also  made  significant  progress  in  WMD  impact  on 
failing  sensor  networks  with  localization,  evolution  of  failures  and  their  complete  characterizations,  together  with 
techniques  to  mitigate  and  overcome  failed  regions. 

Failures  and  Vulnerability  in  Physical  Networks:  Smart  grid  is  an  emerging  cyber-physical  system  which  is  ex¬ 
pected  to  replace  traditional  power  grid  in  near  future.  Traditional  power  grid  has  been  running  for  decades 
without  significant  changes  on  its  infrastructure  and  begins  to  show  its  inability  as  the  demand  for  power  delivery 
and  consumption  boosts  in  recent  years.  One  main  reason  which  causes  the  inefficiency  of  traditional  power  grid 
is  the  lack  of  a  full-fledged  communication  infrastructure.  Although  there  exists  a  control  and  monitor  network 
which  is  built  above  the  traditional  power  grid,  most  power  devices  still  operate  in  an  isolated  manner  and  their 
operation  is  based  on  electrical  properties  rather  than  information  exchange.  For  example,  a  relay  makes  the 
decision  to  open  a  circuit  breaker  only  when  it  detects  the  current  on  a  feeder  exceeds  the  threshold,  it  neither 
tells  other  relays  its  own  status  nor  takes  information  from  other  relays  to  help  itself  make  a  decision.  The  lack 
of  information  exchange  makes  traditional  power  grid  fragile  because  in  many  situations  it  is  too  late  to  take 
action  when  there  is  a  noticeable  physical  change.  The  integration  of  communication  networks  with  power  grid, 
however,  brings  a  new  host  of  vulnerabilities  stem  from  Internet  and  opens  the  door  for  potential  adversaries  to 
teai-  down  a  physical  system  through  a  cyber  attack,  which  is  clearly  relevant  to  DTRA  missions. 

7.5  Participation  and  Presentations 

Our  recent  presentations  include: 

•  Harish  Chintakunta  and  Hamid  Krim,  “Distributed  boundary  tracking  using  alpha  and  delaunay-cech 
shapes,”  demonstrated  at  the  17th  International  Conference  on  Discrete  Geometry  for  Computer  Imagery 
(DGCI),  2013. 

•  Harish  Chintakunta  and  Hamid  Krim,  “A  Distributed  Collapse  of  a  Network’s  Dimensionality,”  IEEE 
Global  Conference  on  Signal  and  Information  Processing,  pp.  IPN.PB.4,  2013. 

•  Chintakunta,  H.;  Krim,  H.  ,  “Detection  and  tracking  of  systematic  time-evolving  failures  in  sensor  net- 
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works,”  Computational  Advances  in  Multi-Sensor  Adaptive  Processing  (CAMSAP),  2011  4th  IEEE  Inter¬ 
national  Workshop  on  ,  vol.,  no.,  pp. 373-376,  13-16  Dec.  2011  doi:  10.1109/CAMSAP.2011. 6136029 

•  Gamble,  J.;  Chintakunta,  H.;  Krim,  H.;,  “Applied  topology  in  static  and  dynamic  sensor  networks,”  Inter¬ 
national  Conference  on  Signal  Processing  and  Communication  (SPCOM),  2012. 

•  Wang,  T.;  Krim,  H.;,  “Statistical  Classification  of  Social  Networks”,  IEEE  International  Conference  on 
Acoustics,  Speech,  and  Signal  Processing  (ICASSP),  2012. 

•  Lei  Sun  and  Wenye  Wang,  “On  Latency  Distribution  and  Scaling:  From  Finite  to  Large  Cognitive  Radio 
Networks  under  General  Mobility,”  presented  by  Lei  Sun  at  IEEE  INFOCOM  2012,  Orlando,  April  2012. 

•  Lei  Sun  and  Wenye  Wang,  “Understanding  the  Tempo-spatial  Limits  of  Information  Dissemination  in 
Multi-channel  Cognitive  Radio  Networks,”  presented  by  Lei  Sun  at  IEEE  INFOCOM  2012,  Orlando,  April 
2012. 

•  Yi  Xu  and  Wenye  Wang,  “Scheduling  Partition  for  Order  Optimal  Capacity  in  Large-Scale  Wireless  Net¬ 
works,”  presented  by  Yi  Xu  at  ACM  MobiCom  2009,  Beijing,  China,  September  2009. 

•  Yi  Xu  and  Wenye  Wang,  “Characterizing  the  Spread  of  Correlated  Failures  in  Large  Wireless  Networks,” 
presented  by  Yi  Xu  at  IEEE  INFOCOM  2010,  San  Diego,  CA,  March  2010. 

•  Lei  Sun  and  Wenye  Wang,  “On  Study  of  Achievable  Capacity  with  Hybrid  Relay  in  Cognitive  Radio 
Networks,”  presented  by  Lei  Sun  at  IEEE  GLOBECOM  2009,  Honolulu,  Hawaii,  November  2009. 

•  Harish  Chintakunta  and  Hamid  Krim,  “Divide  and  Conquer:  Localizing  Coverage  Holes  in  Sensor  Net¬ 
works,”  presented  by  Harish  Chintakunta  at  IEEE  SECON  2010,  Boston,  MA,  June  2010. 
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